ID 152 €59 

&OXKOS 

liStlf oxzo'v . 

SPOKS AGIiHCi 

BB90RT k6 . 
FOB b&TE 
not! • 

ifAIIABLB FfiOH 



L^OIBiC IBSCHB 

95 



Ti 007 218 



Rosa, Clare; Nyre, Glenn F. 

The Practice of Bvalnation. ERIC/IH Report 65. 
SRXC Clearinghouse on Tests^ aeasareaent, and ^ 
Svaiuationf Frindetoh, M.clt 

J^tional Inst, ot Education (DBEH) , Washington, 
D.C. 

BRIC-TH-eS 
Dec 77 
95p. 

ERIC Clearinghouse on Tests, Heasuresent, and 
Evaluation, Educational Testing Service, Princeton^ 
Rev Jerfe? Qd541 ($5.00) 



< BDRS PiaiCB 
OBSCRXFIORS 



BF-10.83 HC-$4.6"7 Plus^ofltager. — ~r- — , — , 

Bibliographic Citations;. *Case Stndiies; «curriculus 
Eva^iiationf Earlj Childhood Education; Eleienti^ 
Secondary Education; Evaluation; *Evaluaiion Hethods; ■ 
Evaluators; Higher Education; ^Bodels; *B«edi 
Assessaent; *Frogras Evaluation; Resesixch Design; 
Research Methodology; Research btilizatioh; state of 
t^e irt jstevievs; Theories 
IDBRIIFIBRS Inforsation inalysis Products 

iBSXRXci 

^ The first half of this sonograph provides an overvieii 
6£ the the(^etical concerns of evaluators. Definitions are provided 
d£ atbcbuntahilit/V seasure^tent, assesssent, evaluation research, 
foraative and sussative evaluation^ goal-rfree eltalu^tion^ goal^rbased 
•valuation^ and evaluation. Several aodels of evaluaticn are 

. 'descciied and discussed, including the Countcinance Hodel>, hy Robert 
Stake; several Goal Attainsent Hodels; the Discrepancy Model iby 
ilial^la .Fcdvus; the CIPF (context, inputs prccess> product) flodel by 
igpn 6tt£a and Daniel stuff lebeaa; and the. decisios^priehted aodel 
div'eioped at'OCLA*s center for the Study of Evaluation by Earvin • , 
Alkin. ScrivenVs dod-us Operandi Method and the Adversary Approach to 
evaluation are also discussed. The chapter on evaloation designs 

..describes experiaental ^esigns^ guasi-experiaental designs, and 
process evaluation. Holistic etraluation and Transactional Evaluation, 
are pcesented as integrated approaches to prcgraa evaluation. She , 
second half of this aoncgraph presents several case studies. They v 
include evaluations of an egual educational opportunity jprbgraa in 
the Califbtnia Coiaunity Colleges, Project Head Start, a prpfessibh^l 
school curricttlua^ and public school curriculua; and needs 
assessaents of a professional school and a faculty develppaent 
progr&a. The final chapter deals with the utilization of the results 
of an evaluation. A list of 112 bibliographical references i^ 
appended* (BH) 



■*• Reproductions supplied by BDRS are the best that can be aade * 
* f roa 'the original docuaent. * 



ERJC 



Us OEPARTMENTOf HEALTH. 
EDUCATION 4 WELFARE 
NATIONAL INSTITUTE OF 
EDUCATION 

THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATIONORIGIN* 
ATINP a POINTS OF VIEW OR OPJNIONS 
STATED DO^OT NECESSARJLY REPRE- 
SENTOFFICIAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY 




o 



ori'HKrts, 

MtlSIIFtlllillt 

ftEviiuitfcm 



The Authors 

. Clare Rose is president and Glen Nyre is vice-president and executive director of the 
Evaluation and training institute, I! 1 10 Ohio Avenue, Los Angeles, California 
90025. 



The material in this publication wa^ prepared pursuant to a contract with the 
National Institute of Education, U.S. Department of Health, Education and 
Welfare. Contractors undertaking such projects under government sponsor- 
ship are encouraged to express freely their judgment in professional and 
technical matters. Prior to publication, the manuscript was submitted to 
qualified professionals for ciitical review and determination of professional 
competence. This publicatioi has met such standards Points of view or 
opinions, however, do not necessarily represent the official view or opinions of 
either these reviewers or the National Institute of Education. 



ERIC Clearinghouse on Test J. Measurement, md Evaluation 
Educational Testing Service 
Princeton, NJ 0^54 1*' 

December 1977 



3 



CONTENTS 

PREFAGE-. ..; 

INTRODUCTI6N 



EDUCATIONAL EVALUATION: ISSUES AND TERMS 2 

The Problem of Definition 5 

A Definition of Evaluation \ . .T*. 7 

MODELS of' EVALUATION 9 

■ The Counteriance Model •. 1 10 

Goal Attainment Models 12 

The Discrepancy Model 14 

The CIPP Model I5 

The CSE Model 22 

Some Nev^Approaclfes ■ 24 

^ « 

EVALUATION DESIGNS , ' ,...26 

Quasi-Experimental Designs 28 

^Experimental'Designs-^^.-.-. T . . . . . 29 

Process Evaluation— the Other Extreme .TT ^77.7773^ 

INTEGRATED APPROACHES TO PROGRAM EVALUATION 38 

Holistic Evaluation 38. 

Transactional Evaluation • 39 

CASE STUDIES J. 42 

The Evaluation of Social-Aclion Programs 42 

EOPS: A Case Study of Holistic Evaluation \ 43 

Project Head Start: A Case of What Went Wrong 54 

Curriculum Evaluation 58 

An Evaluation of a Professional School Curriculum ! -. . . 60 

A n Evaluation of a Public School Curriculum 67 

Needs Assessments 

Summary 74 

UTILIZATION, QUALITY, AND ETHICS 76- 

REFERENCES \ ....it 



ERLC 



4 



Hi 



PREFACE 

The literature, of e'ducational evaluation, consistent with its adolescence, 
seems to be smack in the middle of a growth spurt. The almost total paucity 
df such literature a decade ago has been supplanted by a goodly assortment ^ 
of educational evaluation writing^ today. Unless, like ^he teenagei- it is, our 
evaluation literature suddenly ^tops growing, we can surely ipredict.a 
geometrie expansion of evaluation writings in the decades to come. 

As usual, of course, tomorrow's evaluation literature will be markedly 
influenced by today's evaluation writers. Fortunately, Rose and Nyre have ' 
put together a mono'graph* that should have a salutary influence on the litera- 
ture to come. More immediately, it should prove useful to educators who are 
^ getting ready to wade into that real-world, cost-conscious, politicized, un- 
predictable maelstrom known as educational evaluation. It is a constant 
source of amusement to practicing educational evaluators that the 
uninitiated conceive of educational evaluation as largely an enterprise in 
which theoretical models are adroitly employed to cope with the realities of 
educational practice. After reading The Practice of Evaluation, it would be 
^ difficult to hold that view. 

Rose and Nyl-e have divided their monograph into two essentially distinct 
segments, the first of which provides the reade'r with a succinct overview,of 
the rudimentary theoretical concerns that educational evaluators have been 
. tangling with for the past decade or so. For the beginner, this section will 
prove useful as an introduction to the field. 

^In^the^econdr and-to this reader the most.interestinR,^^ection_of the 

monograph, they describe a series of actual evaluations. Tliese^case studies 
are particularly intpguing because in all but two instances the autliors are* 
reporting on evaluations in which^they personally took part. Few theoretical 
texts on evaluation can ever, with the candor employed here, capture so 
vividly the dilemmas faced by evaluators who are attempting to do an in- 
tellectually defensible job but must still tussle with the practicalities of life in 
the real world and all its pressures to compromise one's standards. Rose and 
Nyre offer us some useful insights into that world from the perspective of in- 
dividuals operating a private evaluation agency. 

The reader should become familiar with tfie theoretical discussions in the 
initial section of the book in order to make the subsequent case studies all 
the more meaningful. Interpreting real case studies according to theoretical 
propositions will, :of course, make for difficult reading. But who ever said 
that educational evaluation ought to be easy? 

W. James Popliam 

University of California, Los Angeles 

and * ^ 

Instructional Objectives Exchange 
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Any professional area^that is ^o much' avoiSed; that produces so many 
anxieties; that immobilizes the very people who wan. to avail themselves of 
it;:that is incapable of operational definition, even by its most trained advo- 
cates, who in fact render bad advice to the practitioners who consult them; 
w>t}ch Is not effective in answering reasonable and important questions, and * 
wfych has made little apparent .effort to isolate and ameliorate its most 
serous problems-must indeed give us pause, 

E. G. Cuba 

xfNTRODUCTION 

Less than five y.ears^gb, our collection of non-journal works on evaluation 
' consisted of^^a fe\^ well-worn monographs and even fewer books. Today, our 
-file drawers and shelves are filled. Therfe are well over a dozen Iferd-cover 
books complete with artist-designed jackets; most were written in the last 
two-or-three-years. But , with air their instructional value, tliere is not oi!e 
casebook among them that describes real-worid evaluations in the context of 
recommended evaluation models and designs. After all the theory has been 
studied, and the methodologies learned, only such a book Can provide 
guidance to fledgling evaluators (or even seasoned ones) in the practice of 
prpgram evaluation. 

Altiiough we, too, felt compelled to deal with basic principles, 
procedures, and methodological issues (and the first part of this monograph 
IS devoted to their treatment), they are presented primarily as a foundation 
for the case -studies that follow and simply provide increased understanding. 
o£why;the evaluators carried out their investigations as they did. The pur- 
pose' of-this-monograph is to provide-an-overview-of-basic.principles and 
procedures and a guide to the practice of evaluation. 
' We first entered the world of evaluation with about equal proportions of 
.good intentions, graduate training in research methodology, experience in 
survey research, high hopes, and naivete. We were going to reform educa- 
tion through the wisdom and insight of our impeccably planned, exquisitely 
elegant evaluations. 

The first evaluation we were asked to conduct involved' a staff-training 
program for public school teacher specialists in an urban ghetto. The budget 
was miAiscule, but we didn't care. When we asked about the purpose of the 
evaluation, Ve were told, "Every program should 61 evaluated." Here were 
ouf kind of people. They believed in the monumental-and essential-value of 
/evaluation! \ ' 

- We developed (arid even pretested) several forms of questionnair.es.and 
/ interview schedules. Because it was impossible to pin down ariy goal^ for the 
program, we weye afraid we might overlook what could turn out to be valu- 
able d.ata. We spent long hours trying to figure out a way to cast the study 
into an experimental mode. But all'of the teachers in the district office were 

/ 
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going to participate (we had actually be'en called in J^^fo J the proaram got 
Snder way. just as our professors had told us it should be done), and the idea 
nfiisine a control group was ludicrous. ' ' ^ t^^^ 

When we arrived at Ihe site where the week-long program was to take 
oial 'we meT^he plrtic{pantsJor the first titne. saw the schedule of . 
d^iues'^ and heW our bredlh. As it turned out.^there was no staff 
deSiment program and there never was any plan for an evaluation. We 
Stecome pawns in a political confrontation between two,ethn.c groups 
who n aStSn to v^^rriSg against each other, had joined together to protes 
Tom^ f h district supervisor's polices. We had been J^he Jina 
ouch to distract the supervisor from the real purpose of the week-a 
showdown miiafatleast in emotion to the la^ 
Stay y this was a most unusual situation, but intent on our Purpose we. 
■ harouSder on to the tensions all around us. Fortunately, we have never 
enlunte ed a i°^^^^^^ c^ase since then. But we have found ourselves m many 
ituaSSrwher w^^^^^^ not randomize or identify comparison groups and 
"putdai were seldom av^lable and -^oo. P^s^^^^^^ 
Dleas for performance testing to obtam outcome data. And we have Dee n 
£dVo conduct "formative'' evaluations long after programs have beenm 

°^Ovtihe years we have learned that for every program that' permits 
riaorous and sys^ data collection based on defined and generally 
Xd ui^n program goals, there are many more that are hotbeds of con- 
• rov rsy with different groups of people holding different goals for e 
nrn7rU and seeking different information from the evaluation. For every 
SrSmrthat permits randomized assignment to treatment and control 
PS tie. e are many more in which the real participants of the Progra- 
are ha-d to identify, let alone cast in an experimental design. And finally, we 
susoect tha for every evaluatOr engaged from a project's inception in a well- 
Sned weMunded, potentially significant evaluation study there are 
Sozens Jre wSofind themselves faced with the task of evaluation m a far 
ess WeaTs tuaTon These are the common P-blems encountered by people 
• engaged ll program evaluation. This monograph is addressed to them. 

EDUCATIONAL EVALUATION: ISSUES, AND TERMS 

-Evaluation is not a new concept; nor is it unique to education Moses 
eva lua'eS when he decided to risk the perils of foreign- travel and led he 
nelTe out of Egypt. David evaluated, albeit hurriedly, when he aimed the 
si ng hot t^^^^^^^^^^^ forehead. We all evaluate. Deciding whether to go o 
F Soe or stay home and paint the house during summer vacation involves 
' both affectS^conomk evaluation. When we go the the market to buy 
ap^ef we Ire eva^ as we select the largest, firmest, juiciest, and red-^ 
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dest (or greenest, depending upon your preference). Every time we make'a 
decision more 9? less .nationally, systematically weighing the advantages 
<- and disadvantages of thp alternatives, we are engaging ih evafuation. 
• K equally long historyj, dating back to 2000 B C. 

when Chinese official^ adnnnistered civil service examinations (j'l }) The 
?oll?'''T' evaluation was conducted in^he United States, in 

1887 by Jbseph Mayer Rice, a free-thinking pediatHcian>Considered a land- 
mark study, in contrast to the simpl^^tic surveys anh even more simplistic in- 
terpretations that were characteristic of the time, iRice de'velopedJiis own 
spelling test and administered it to over thirty thou'sand students in aslaroe 
metropolitan- school district. He wanted to show thW student achievement 
had no relationship to the amount of time students spent in what he felt wercv 
, senselessiand interminable spelling drills (111). Unfortunately, a sophisti- 
cated technology did not evolve as ^ result of Rice's study, and most of the 
activity conducted in the name of evaluation f(jr the next 20 or 30 years 
consisted of giving school children a variety of tests \A every different sub- 
ject. Measurement, not evaluation, leaped ahead. \ 

It was not .until the 1930s, when another trailblazer b\ the name of Ralph 
Tyler demonstrated a new approach to evaluation in the ^ight-Year Study of 
the Progressive Education Association, that the foundation was laid for the 
form of evaluation we know today. Tyler conceived ofievaluatioA as the 
process of determining the degree to which th& goals of a program have been 
achieved. And, to Tyler, goals and objectives had to be defined in behavioral 
, terms. Goals were derived from three basic sources: students, society, and 
the subject matter. General goal statements were then analyzed within the 
context of the psychology of learning (Can they be attained by the target 
population?) and a philosophy of education (Are they woi thwhile and com- 
patible with the purpose of education?). The goa.s that renVain after this 
screening are transformed into specific behavioral statements Of objectives^ 
the degree to which students attain these objectives at the end of a prog.-am' 
IS measured; and the results are used to judge the effectiveness of the 
.program (96). Goal-attainment models of program evaluation are much in 
evidence today and form the base of many experimental studies. \ 

Still, the demand for formal program evaluation was not ignited Until after ' 
the launching of the first Russian satellite. Sputnik will probabfy be re- 
membered in the education world less for its impact on the space program 
than for its launching of the educational reform movement. Both educational 
reform- and evaluation owe the beginnings of their modern histories\to the 
furpr created by the Ru.',sian feat. Public outrage turned against the schools, 
and for the first time in /.merican history, the quality of our most honored in- 
stitution, the school system, was seriously questioned. In part because of 
this concern, and in part because of civil rights groups' demands for fair 
;treatment of minority children in the schools, the federal government began 
tD contnbute a greater share of the schools' financial support, which up until 
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Education-Act (ESEA) of 1965. „„.omc nrnvided for thousands of 

""f? "'d^ ofX S ITfrXa™ knowledgeable aba,, 
rraS.taS'S^"n reseda . 

and, not surprisingly, the federal governmem Would-be 
Large-scale evaluations of federa^ p™^^^^^^^^^ 

evalualors "^^^^ program goals. 



than the 391 itself. 



' The Problem-of Definition ^ ^ 

From these inauspicious beginnings emerged the field of evaluation as we 
know it today— a field that is characterize'd by confusion, conflict, con- 
troversy, and mistrust. E valuators do not share a common philosophy, focus 

. or temunology. Fiercely loyal to different "schools" of evaluation, educa- 
tors argue over goal-free, goal-based, ?Lr\d formative ?Lr\dsummative evalua- 
tion. Even 'the most basic terms,' such as measurement, assessment, 
and evaluation are used interchangeably and often incorrectlyv It is no 
wonder that in some quarters evaluation is not yet legitimized. In order to 
clarify some of the m^or evaluation terms with which the reader shoulcj be 

, conversant, it will be helpful to examine their definitions before* we proceed 
with our discussion. 

Accountability: Accountability is concerned with furthering the educa- 
tional effectiveness, of school systems (3). The Random House Dictionary of 
the English Language shows the synonym of accountability to be "responsi-' 
ability." Educational accountability thus represents the educators' accep- 
tance of responsibility for the. consequences of the educational system 
entrusted to them by tne public. Evaluation is an intrinsic part of ac- 
countability. Program effectiveness must be evaluated to provide informa- 
tion for teachers, administrators and program directors, as weJi as legislators 
atid other officials who allocate the funds for the programs and for ther public 
who provides the funds through their tax dollars. Accountability is usually a 
condition requiring evaluation; but accountability is not equivalent to 
evaluation. 

Measurement: As we said earlier, measurement is often eiquated with 
^valuation, since so many of the early evaluation reports consisted primarily 
of measurement data. But measurement is static— it is the act or process of 
determining the extent, dimensions, quantity, or capacity of something at 
one point in tirtie. In education, measurement is the act of determinip ♦he 
extent to which an individual h^'learned or .the degree to which-ati indi- 
vidual possesses a certain characteristic, ability, or talent. Measurement is 
usually part of the evaluation process, providing useful data for evaluation, 
but again, the two terms are not equivalent. 

Assessment: Like measurement, the term assessment is often used inter- 
changeably with evaluation, and several m^or evaluation projects have 
been referred to as "Mational Assessments." Assessment is really more o 
akin to measurement, however, and refers to the process of gathering and 
collating the dat^. Anderson and associates (3) claim that assessment has a 
narrower meaning than evaluation pnd a broader meaning thjn measure- 
ment.-^ii addition to the act of measurement, assessment involves the quali- 
tative judgment of "determining what and how to measure as well as the 
V process of putting the data into an interpretable form. 
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Evaluation Research: Although many writers classify e^^^^;?^" 
of research; or conversely, view evaluation research as a.spec.fit method of 
fvaSn others make a sharp distinction between the ^y^^J^^f^^ 
tion research i? defined as the Application of social sciencamethods to dis 
cover fnformatfon of importance to program practice ancj public Pol'^V (98 • 
S c t i^ distinction is that the e valuator doing evaluative research ac.s 
aT an objective scientist, i^mploying qii^ntitatiye and reproducible e h- 
n^que. and eschewing judgment. Research Ispnmanly ^o^^^^^^^^ 
basic theory and design of a program over ^^^-P?^'^'' i'-^l;.^;''"' " 
may to some extent be concerned vith basic theory and design, but its 
primary function is tp appraise a program to determme its ^ent^ 

Formative and Sunimative Evai,uithm: Coined by Michael Scnven (76) 
the?Te?ms distinguish between th. iwo basically different roles served oy 
evaluation. Formative evaluation refers to those valuations und r ak n 
during the developmental process for the express purpose of £U|ding and 
Ssing program improvemcnt.On a formative evaluation, the evaluator 
S gather s^edfic data on v^arious aspects or components of the program 
Tr sevS stages throughout ..l^e developmental phase in order to^denti y 
las requiriS^mprovement, This information provides the developer with 
Siricallta'to hdp determine where ^nd how to revise the program and 

'"tuti^ative evaluation, on thcother hand, refers to the A-' °[ 
a program and is concerned with determining the worth of the overall 
orolnm «//.r it has been completed. The purpose of summative evaluation 
rs toLr^.ak decisions regarding the program's f"ture-its con«^^^^ - 
terminator, replication and/or dissemination./ Implicit wu^^ 
terms formative and summative. is another distinction, svnich refers to the 
ei^tor's role That is. because the purpose of formative evaluation is 
■ Z^ol7 the formative evaluator becomes pflrt of the developmental , 

' 0 :ss and thl" sk of formative evaluation can even be P-^-^J '^yj^ 
■ nmoram developer. If a person other than 'the developer performs the work 
rSati vr vS^tion. that person can work closely and collaborative y 
°w th the ieve^opei? The' p.int is that there is no -d to e^^^^^^^^^^^ 
objectivity in the formative stages of program development. The goal is 
Svement and both the developer and evaluator can be ':ommitted to 
that end ^Se summative evaluator is in a different position. Summalive or 
finil endif-program evaluation demands an objective and impartial eyalua-. 
Son since the future of the program is at stake. The summative evaluator^ 
must be completely independent of the developer. 

importance than su mmali vc evaluation. 
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Although these terms were developed for the evaluation of curriculum ma- 
terials, they have been adopted by the educational community as purt of the 
basic vocabulary of evaluation and are usad to distinguish the two opera* 
tions in any type of e valuation enterprise. 

Goal-free Evaluation: Another ternv created by Scriven (77), goal-free 
evaluation is an approach that aims to ensure that evaluators pay attention to 
the actual outcomes of a program, intende^ as welfas unanticipated, rathei 
than just the quality of the progranj goals ,or the. extent to which they have 
been achieved. Scriven was concerned that an evaluator would become 
preoccupied with goals and, consciously or unconsciously, ignore tf ./ide 
range of actuj^ outcomes which, intended or not, are nevertheless real. In 
the goal-free Approach, the evaluator deliberately avoids gaining afiy 
knowledge of the program goals (a simple task in cases where program goals 
don't reaily exist), gathers ftata on the actual outcon>es only, and then 
evaluates their importance. Goal-free evaluation was not conceptualized to 
replace goal-based evaluation^ but to augment it and thus provide a more re 
lisible and valid evaluation. j 

Goal-based Evaluation. Goal-based evaluations refer to evaluations that 
are based on ttje extent to vyhich intended project goals have been achieved. 
As su'^ested by Scriven,,this should be accompanied by an assessment of 
the quality of the goals establisheci in t^e first place (76). 

' • - ^ 

A Definition of Evaluation 

Finally, (he most important term to define, and one of .the most con- 
troversial, is the word evaljiation itself. The attempt to darily.the meaning of 
evaluation is not an idle;exercise. Quite the contrary. It is of miyor im- 
port ince since no one is agrjeed upon a definition and the different definitions, 
people.accept carry with them different advantages and disadvantages, each 
affecting the way in w^hich evaluators approach and carry out their tasks. 
For example, three definitlDns of evaluation have appeared at one|.*ime or 
another in its history, measurement, congruence between objectives an<j 
performance; and judgn[ient (59), When measurenfent is accepted as the 
definition of evaluation, the evaluator's main task is to administer tests and 
gather measurements. The role of the evaluator is equivalent to that of 
psychometrist. If evaluation is defined as professional judgment, then a 
group of **experts" woul<i observe a program in action and, subsequently, 
pronounce judgment eXpertly--an act reminiscent of accreditation 
procedures from whence the d.efifiition is derived. . ' 

Definitions of evaluation also provide the conceptual base for the modtls 
of evaluation, and, although there are still a few educators who subscribe to 
the measurement definition (23, 93), art examination of the literature and a 
review of the different models and classification schemes indicate that model 

7 

12 : " • 



>})uilders arid evaluation writers cluster around three major definitions: 1) 
th&se that define evaluation as an assessment of thfe discrepancy between 

.-.objectives-and performance (Metfessel and Michael; Provus; Stake; Tyler); 
2) those that focus-on outcomes and definecevaluation as an assessnient of 
.(Hrtcomes, intended,or otherwise (PopHam; Scriven); and 3) those who ^re 
decision oriented, defining evaluation as the process of obtaining and provid- 
ing information for "decision makers (Alkin; Cronbach; Guba and Stuffle- 
bcim). Each of these "schools" of evaluation thought and the wntings of 
their proponents will be discussed subsequently. 

A central issue for all.three groups is that of value. The advocates ofigudg- 
.ment follow-the dictionary definition, which states that "to evaluate is to as- 
certain the value oP' {Random House Dictionary). Thus, Popham (60) 
speaks of formal evaluation as the "asset^ment of the worth of educational 
•phenomen.V' and Scriven (76) goes further, suggesting that without judg- 
ment-of^merit, no evaluatioivhas-taken-place. Glass- similariy stresses-that 

cvaliration is an attempt toSsSess the worth or social utility of a thing, and 
Stake.(83). specifies description and judgment as the two basic ingredients of 

— cvaluatiorir-Dressel- (21) broaders ttie definition to include process. To 
Dressel, evaluation is "both a judgment on the worth or impact of a 
program, procedure orjndividdal and the process whereby that judgrnent is 
made." Others who support the judgment of merit position include Airasian 
(1)» Sax (73), Suchman (91), Weiss (98, 99), and Wholey et aWlOT). 

I At the other end of the spectrum are those who eschew a value onenta- 
tion, viewing the function of evaluation instead within the context of deci- 
sion making only. In this case, theeyajuator gatlfersjnfprmation concerning, 
-•the relative advantages and disadvantages of various decision alternatives so 
that decisions can be made rationally and systematically. The uses tooyhich 
evaluation information is at tually put by decision makers is yet another mat- 
ter, one that will be dealt with later. Guba and Stufflebeam (37) object to 
judgment or value definitions because they, ignore the processes of amving 
at the information. They suggest instead that "evaluation is the process of 
diUneating, obtaining and providing useful information forjudging decision 
alternatives.'-' Along the same lines, Alkin (2) offers a somewhat longer and. 
broader version^ which includes, identifying the-decision^areas-as-welUas. 
.colfecting and providing the information to decision makers. .... 

Some who oppose the val,ue dimensions are concerned that>passipg judg- 
jnent -will ultimately diminish the evaluator's access to^iata and evaluation 
will become even more suspect'than it is now. Others, such as Guba and 
StuEQebeam, Provus, and Alkin, Jtake the position that the act of judging or 
•making the final determination of the w{orth or merit of an educaUonal 
program or product is only within the purview of the decision maker, not the 
evaluator. Popham (60) refers to the three models upon which these defini- 
tiorts are based as "decision-facilitation models." Although they do involve 
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the cvaluator's use of judgment as well as a determination of whether the 
program goals have been attained, their orientation is toward servicing deci- 
sion makers. **'The orientation of these models is so overwhelmingly toward 
servicing educational decision-makers that some of their proponentiv 
conceive of the evaluator as the decision-maker's handmaiden/handmister." 
(60) Brief descriptions of these modeis are presented in the next section. 

MODELS OF EVALUATION 

"Evaluation models are as prolific as rabbits, and they procreate about as 
speedily. No longer do people develop an idea or test an approach. Instead, 
Jhey develop a model. Often spawned from combinations of several other 
models, some'ffopi other disciplines, they-become progressively.more gran- 
diose in their complexity, more esoteric, in their terminology and more 
pompous in their names. One has only to examine a recent program schedule 
for the American Educational Research Association's (AERA) annual meet- 
ing or the extensive Educational Resources Information Center (ERIC) 
abstracts on evaluation. The most frequently used paper title begins with the 

words *The Development of an Evaluation Model for " 

The array of evaluation models from which we may choose would, if noth- 
ing else, provide a marvelous tongue-twisting party game. Just iniagine-what 
it would sound like if someone who^d had too much to drink were to chant in 
mantra form the names of evaluation models and approaches. We have 
democratic evaluation, responsive evaluation, transactional evaluation,, 
toodus-operandi- evaluation, holistic evaluation, discrepancy evaluation, 
gbal-free evaluation, and adversary evaluation. There is t}ie Countenance- 
Model, the Differential Evaluation Model, the Priority Decision Model, the 
Trade-Off and'Comparative Cost Model, the Systems Approach Model, and 
the Cost Utility Model. There are Ontological Models, Synergisti,c Models, 

' and Ethnographic Models.^ And this is only a partial list. Indeed, model 
building has become so commonplace, that to be truly distinctive these days 
one should eschew model molding altogether. 

Many of these so-called models, of cpurse, are not. really models, but 
rather, descriptions of processes or approaches to i)rperam evaluation. The 
purpose of a 'mo.del is to guide and focus inquiry. Borich (7) indicates that 
models in the social sciences have three identifiable characteristics: preci- 
sion, specificity, and verifiability. Models ar.e precise because they are quan- 
titative in nature. The elaborate forms of measurement are derived purpose- 
fully to describe the phenomena under investigation. Models are specific 
because they deal with only a certain number of phenomena. Models are 
verifiat)le in the sense that hypotheses are formulated and empirical evi- 

V 

*Thc models and their authors arc listf d at the end of this section to avoid interrupting the flow of the text. 



dence is accumulated that eventually determines the modePs accuracy and 
usefulness. In listing the criteria for models, Carter (13) suggests that they 
must be efficient, heuristic, internally logical and complete; capable of being 
extended by empirical study; capable of helping the evaluatpr anticipate all 
of the information needs for decision niaking and capable of relating ele- 
ments in ways not previously related. Borich (7) hastens to add that v^hile 
**e valuators strive to construct. models that are precise, specific and verifi- 
able, the end result often falls short of that which can be expected in the 
sciences." Models are, in effect, conceptualizations, and they may be 
theoretically sound; but they do not necessarily lend themselves to actual 
implementation. 

A few models were no doubt built by Rube Goldberg fans intrigued by 
mazes of convoluted lines, arrows, and dots, and even the best of models are 
not perfect. Still, this should not deter would-be evaluators from having in 
their repertoire an understanding of the m^pr evaluation mpdeis that have 
been dominant in the literature and influential in the field. We will examine a 
few of the important models that have guided evaluations during the last few 
years.^ 

The Countenance Mjdel 

Created by Robert Stake (85), the Countenance Model is so named because 
of the title of his article describing it (*The Countenance of Educational 
Evaluation**). This model is based on the notion that judgment and descrip- 
tion are both essential to the evaluation of education! programs. Accord- 
ingly,-Stake distinguishes between three bodies of information that are ele- 
ments of evaluation statements that should be included in both descriptive 
^* and judgmenta] acts. These elements are: antecedents, transactions, and 
outcomes. ^ 

Antecedents refer to conditions existing prior to implementation of the 
program that may relate to outcomes. Transactions are the ''succession of 
engagements'* that constitute the process (in other words, the instructional 
process or educational aspect of the program). Films, examinations, home- 
work, class discussions, and teachers* cofiiments on student papers are all 
examples of transactions. Outcomes, as conceived by Stake, refer to much 
more than traditional student outcomes. They include immediate, long- 
range, cognitive, affectiver person, and societal outcomes. Outcomes also 
% include the program*s impact on teachers, administra|ors, and others as well 
as the wear and tear on equipment and facilities in its conduct. 



^For comparative analy$c$ of the different models, readers are referred to Worthen and Sanders* (1 1 1) multi- 
page descnpUve matnx of models. Wethenll and Buttram's (105) comparison of 21 models, and Carter's (13) 
taxonomy of decision-oriented evaluation models. 
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Descriptive information is classified either as intents or observations. 
iments~include program objectives— not-only intended student outcomes, 
.but also the planned-for environmental conditions as well. The judgment Ma- 
trix includes both the standards used to reach judgments and the actual judg- 
ments themselves. A graphic representation of Stake's layout is presented in 
Figure 1.' 

INTENTS OBSERVATION^ STANDARDS JUDGMENTS 



RATIONALE 



ANTECEDENTS 




TRANSACTIONS 



OUTCOMES 



DESCRIPTION MATRIX , JUDGMENT MATRIX 

Figure 1. Layout of the Countenance Model* 

Note that a separate box depicted to the left of the layout is labeled riz- 
tionale. According to Stake, an evaluation is not complete without a state- 
ment.of the program's rationale. This statement indicates the philosophical 
-background-.and-,basic.„purppses of the program and provides a basis for 
evaluating intents. 

~ "there are two principal ways of processing descriptive evaluative data: 
finding the contingencies among antecedents, transactions, and outcomes; 
and finding the congruencies between intents and -observations. The d^ta for 
a program are congruent if what was intended actually happened, although 
Stake-admits that it is unlikely that all of the intended antecedents, transac- 
tions, and outcomes come to pass exactly as intended even in the best of 
programs. With reference to transaction data. Stake insists that the evali{^- 
tor carefully observe and record data emerging from the transactional and 
interactional classroom processes. He broadens the general concept of out- 



•W. Jamc$-Popham;'£(/ufar/oflfl/ £\'aiuanon, « 1975. p. 31. Reprinted by permission of PreniiccHall. Inc.. 
Engle'wood Cliffs. N J. 



come data to include future application, transfer, and the effect of process 
on outcomes. 

The contingencies among the variables are of special importance to the 

evaluator. In the sense that evaluation-is-the. search for rejatignships that ^ 
facilitate educational improvement, the countenance evaluator's task is to 
identify outcomes that are contingent upon particular antecedent conditions 
and instructional transactions. 

We previously stated that the foundation for a model's orientation derives 
from the author's definition of evaluation. In this cr^e. Stake is a proponent 
, of the value-judgment school; the model is judgmental and the process of 
judging the merit of a program is an integral part of the model. There are two 
bases forjudging the characteristics of a program in the Countenance Model: 
evaluating a program either on the basis of absolute standards or relative 

standards— that Js,. either standards reflecting personal opinion concerning 

what the program,should be or standards reflecting other similar programs. 
Judgment is involved in choosing which set of standards to use— absolute or 
relative — to obtain an overall rating of merit upon which to base** recom- 
mendations regarding the future of the program. 

. Jnjatex.wiltiiig^ evaluation,'' Slake (84) adds that rather 

than personally passing judgment, the evaluator should collect.samples pf 
the judgments of many people in the program — the clients, staff, com- 
munity, and others."* Stake's emphasis on the evaluator's need to be fully 
aware of and sensitive to the concerns of many people affected by the 
j)rograni became the central theme in several **process-only" evaluation ap- 
proaches discussed in the next chapter. 

Goal Attainment Models 

Fathered by Ralph Tyler in the 1930s, goal-attainment or objectives-oriented 
models still provide guidance for many evaluations and occupy an important 
place in the- literature. An example of a goal-attainment model is the para- 
digm developed by Metfessel and^Michael (54)/nie steps of their model are: 

Involve members of the total community directly and indirectly as par- ^ 
ticipants in the evajuation; i 

Develop broad goals and specific operational objectives, both cognitive 
and noncognitive; ' ^ . 

Translate objectives into forms that are communicable and that can be 
implemented to facilitate learning; 



^Many prominent evaluation theonsts expanded the classic paradigm by broadening the definition of dedsipn , 
maker and legitimizing data other than test scores, particularly the judgments of various people.tnvolved 
directly and tr^directly with the program (75). 
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4. Dcviclop critcriph measures and instruments to determine whether the 
program achieved the objectives; 

5. Measure the program's progress toward attairimnent of the objectives 
and, finally, measure attainment of the objectives; 

6. Analyze the data; • " 

Interpret the data in light of established standards and values; and ^ 

8; Formulate recommendations for program improvement as well as for 
revisions in the goals and objectives. 

The appendicesio the article contain lists of criterion measures (for Nvhich 
Metfessel and Michael have become better known than for their paradigm) 
,that„.can bemused , by .the . evaluator. in . the fourth step of the model. The 
measures are wide-ranging, with -those for determining student behavior 
including self-inventories, standardized tests, rating scales, projective tests, 
anecdotal records and case histories. Measures are also provided for teacher 
and comniunity behavior. 

Somewhat similar to Metfessel and Michael's strategy is one ojflfered by 
Robert Glaser (29). Kis scheme, which excludes sumpiative evaluation, 
consists of six steps that comprise a continuing cycle of formative evalua- 



1 . Specify the outcomes of learning in measurable terms; 

2. Analyze the learners' entry behavior— the level of knowledge, skill , or 
ability already in the students' repertoire relevant to each task speci?- - 
fied in the objectives; 

3. Provide students with various learning alternatives; 

4. Monitor^students' progress toward objectives; 

5. Adjust the instructional program according to the level oLstudents' 
performance as they progress towar .rattainment of the objectives; and 

.6. ''Evaluate the program for on-going feedback and program improve- 

— ment^„__ / ^ ' 

^Glaser's paradigm is most suited to the evaluation of instructional pro-"" 
grams, although the strategy is generalizable to other program situations. 
Glaser has been particularly effective in specifying the conditions necessary 
for the evaluation of instruction, and Ids main contribution in this area is his 
emphasis on detailed diagnosis of student (participant) entry behaviors, an 
emphasis that is important in almost all program evaluations. 
Despite their several advantages, there are more than a few criticismj of 
- goal-attairimerit models. Scriven (76) was the first to caution against indis- 
criminate goal-based evaluation >yithput an accompanying evaluation of the 
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quality of the goals themselves: '\ . . k is obvious that if the goals aren't 
worth achieving then it is uninteresting how well they are achieved/' Un- 
fortunately, many evaluators do not heed'Scriven's advice, and the goals es- 
tablished for a program often remain unscrutini:^ed: " 

, Another major .problem with goal-based models is that in order to provide 
an effective base for determining program results, program objectives must 
be- clear and specific. Rarely are evaluators afforded the luxury of explicit 
program goals. '^More often than not, if they exist at all, the objectives are 
vague, general, and too broad to provide a bas^ for comparing results. 
Dressel ,(2I)^ffers a reasonable explanation for the prevalence of globally 
stated progra^n objectives, simply stating that *'if is far easier to generate ^ 
' agreement among different constituent groups if an objective is vague." 
Broad goals are seldom controv^ersial. For example, few people would argue 
if the "goal of a program were to enhance students' self-confidence or ^ 
improve their ability to relate to people or other such incontrovertibly inspir- 
ing goals. Agreement concerning the behaviors or attitudes that students 
would have to demonstrate in order to show that they had indeed increased 
their self-confidence or their ability to relate to people would be far more 
difficult to obtain. In fact, whether or not objectives of this type can even be 
defined in specific measurable terms is itself a subject of great controversy. 

A third, freguentjy heard, criticism of goal-based eva[uations is that focus- 
ing attention on the Results of a program only in terms of its intended objec- 
tives- narrows the evaluation, so that the different procedures used to 
achieve-the-results.and their.relationship to program outcomes are ijgnored. 
-Global judgments of merit, of'course, can^be made concerning: the overall 
value of the program as far as its success in achieving the objectives is 
concerned, but no basis for program improvement— an equally important 
part of evaluation— can be provided by the data. In other words, the goal- . 
attainment model is not decision oriented, only limited information can be 
provided for deqision makers. In decision-orientgd models, the purpose of ^ 
evaluation is to provide information for decision makers for a multiplicity of 
' decisions— decisions concerning whether or not a program is nee(Jed in the 
first place; decisions about whether tcf continue, expand, or terminate a 
program; decisions cpncerning program certification or licensing; and deci- 
sions al^out program improvement. The next two models that are described 
qualify as decision-oriented models for program evaluation, an orientation 
that is evident in-the^definition^ol ejyaluatipnj^^^^^ 
base for their development . . ^ 

The Discrepancy Model 

A very popular and widely used model is Malcolm Provus' Discrepancy 
Model, so named because the discrepancy between performance and stan- 
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dards is a key point In his \Idinition>of evaluation. Provus (64) defines 
evaluation as: 

' ... the process of 1) defining program standards; 2) determining whether a 
discrepancy exists between some aspect of program performance and the 
standards governing that aspect of the program; and 3) using discrepancy in- 
formation either to change performance or to change program standards. 

Depending upon the information yielded as a result of the evaluation, 
there are four possible decisions to be made. The program can be 
terminated; it can be modified; it can coniinue or be repeated as is; or the 
standards can be changed. 

The Discrepancy Model involves five stages, each of vt'hich involves a 
comj)arison between reality, or performance, and standards. Discrepancies 
are determined by examining the three content categories (input, process, 
and output) at each stage and comparing the program performance informa- 
tion with these defined standards at each stage. 

The design of the program is compared with design criteria; program 
operations are compared against the input and process sections of the 
program design; the degree to which interim objectives are achieved is conv 
pared-with the relationship between process and product; the achievement 
of terminal objectives is compared with their specification in the program 
design; and, firiSlly, the cost of the program is compared against the cost of 
other programs with similar goals. 

The first stage focuses on the design and refers -to .the nature of the 
program— its^objectives, students, staff and oth.er resources required for the 
program, and the actual activities designed to promote attainment of the ob- 
jectives. The T?rogram-desigU-that emerges becomes the-standard -against 

which the program is compared in the next stage. 

The second stage, installation, involves determining whether an imple- 
mented program is congruent with its implementation plan. Process is the 
third stage, in which the evaluator serves in a formative role, comparing 
•performance with-standards and-focusing-on the extent4o which the interim 
or enabling objectives have been achieved. The fourth stage, p/Wwc?, is 
concerned with comparing actual attainments against th^r standards (objec- 
tives) derived during Stage I and noting the discrepancies. The fifth and final 
stage is concerned with the question of cost. A cost-benefit analysts; is made 
of the completed program and compared to other programs similar in nature . 

Because the primary function and orientation of /.he Discrepancy Model is 
-to-provideJnfoTmatipn for decision makers, Popham classifies it in his four- 
part model medley as a '^decision-facilitation'' mode! (60). But,.as.Eopham^ 
acknowledges, there is overlap between the categories, and the Discrepancy 
.Model is vulnerable to the same criticisms leveled at the goal-attainment 
models. 
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The CIPP Mbdd 



One of the most well-known and widely used models is the CIPP Model 
developed by Egon Cuba and Daniel StufBebeam (37) CIPP is an acronym 
that-stands-forthe four types of evaluations'for which the model is appro- 
priater context evaluation, input evaluation, procej-j: evaluation, mdproduct 
evaluation. 

As noted earlier, the foundation fbr the development Of a model is the 
author's definition of evaluation, and for Guba and Stufflebe_am '^evaluation 
is the process of delineating, obtaining and providing useful information for 

, judging decision alternatives . " 

TTjis^udrf^ni^^ contains three important points. First, evaluation is a 
systematic, .continuing process. Secondly, the process includes three basic 

^steps: 1) delineating the questions to be answered; 2) obtaining relevant in- 
formation so that the questions may be answered; end 3) providing the in- 

jformation for decision makers. Thirdly, evaluation serves decision making. 
Although there is a judgmental component, the primary emphasis in this 
model is on decision making. Basically, the CIPP model answers four ques- 
tions: 1) What objectives should be accomplished? 2) What procedures 
should be fiDlIowed in order to accomplish the objectives? 3) Are the 
procedures working-properly?and 4) Are the objectives being achieved? 

tihe CIPP "Model, pictured in Figure 2, distinguishes between four dif- 
ferent decision-makiij^ settings in education and four corresponding types of 
rdecisionsT, in addition to-the four types of evaluation that formnhe;oioders 
name. The first distinction, that of decision-making settings, arises directly 

^s.a consequence^ofithe authors' definition of evaluation; lHat1s,lhe exten- 
siveness of an evaluation, as well as the rigor with which it is conducted, are 
determined in large measure by the importance of the decision that is to be 
serviced .^TheJmpoirtance of the decision, in turn, depend? upon the signifi- 
cance of the change it is intended to bring about. For example, decisions that 
will have far-reaching consequences demand evaluations that are thorough, 
rigorous, and, most likely, expensive. Decisions that will have little impact 
on the people or the system, such as the decision to change the entrance of a 
building, do not requir^ expensive, detailed evaluations. 

^ second factor to be considered is the availability of information and the 
decision maker's ability to use it. Evaluations must, of necessity, be more 
extensive when there is little information already available or when the deci- 
sion .maker is not able tp make use of the available information in its present 
form. These two factors-r-signific^ce of the intended change and the avail- 
ability of information, as well ^s the decision maker's ability to use it— form 

~^twa^intersectingJines~wWch,jvhenjco^m^ Weld four classes of decision 

.settings. The continua are labeled **small\^ersus large ch^nle'' and *--hi^ 
versus low understanding." The rule for distinguisjiing between small and 
large change is the degree of controversy over the change. The more con- 
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Figure 2, The CIPP Model* 

-troversial the change, the larger or more important it is. School integration is 
a-good example of a large, controversial change. Large changes usually iiy_ 
volve m^or restructuring within the educational system. ^ 

Small changes, conversely, refer to changes that have no significant iri^ 
pact on variables considered to be important by societj'. Thu^, small ' 
changes are relatively- inconsequential and noncontroversial. Changing 
te^ctbopks, however, or. adding curricular content are examples- of small 
changes that still require eyaluativeinformation for^spund decisions. 

The four decision settings are called homeostatic, incremental, neomo- 
bilistic, and metamorphic, each-referring to^the extent of intended change. 
Homeosfatic decisions arr aimed at maintaining the status quo and, not 
surprisingly, are characteristic of most decisions that are made in education. 
Faculty assignments and course scheduling are examples of homeostatic de- 
cisions. Incremental decisions refer to developmental activities, particularly 
tholse conducted as a part of continuous^ program iniprovement. Contrary to 
their creators' view, many innovations in education are examples of incre- 
mental activities—attempts to make some improvement without risking a 
•m^uor upheaval. 



_J!Soutcc:_P}u^dtaKappa, National >tu(|y Commu«ceon Evzluztion. Sducationai evatuatioi and decision mak' 
ing. Ithaca, III.: Peacock Press. 197 .'Rcpnnt^cd by permission of Phi Ddt^appa« Incorporated. 
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Neomobilistic decisions denote large, innovative activities conducted for 
the purpose of solving'significant problems. Policy research centers and in- 
stitutes that deal with l9ng-range educational planning are engaging in the 
area of neomobilistic decision-making. Metamorphic accision-making.aims 
to produce complete changes in .an educational system. Ivan lllich's pro- 
posal to disestablish sbhools is a good example of what would be meta- 
morphic change in education. Quite obviously, this kind of change wouldbe 
Utopian, and the probability of its taking place in education is indeed slim. 

Within each of these decision-making settings, there are thousands of 
specific-educational decisions that are categorized by the authors into 
another foursome: 1) planning decisions.to determine objectives; 2) structur- 
ing decisions to designjlhe means or procedures to be used to attain the ob- 
jectives: 3).i-mplementing decisions to watch over and refine the procedures; 
and 4) recycling decisions to judge and react to the outcomes or attainments 
of the objectives. ' . 

Corresponding to each of these four decision types are the four types of 
evaluation for which the model was named— context, input, process, and 
product. Context evaluation is the most prevalent type of evaluation used in 
education. The major objective of context evaluation is to determine needs, 
specify the population and san>ple of indrviduals to be served, and devise ob- 
jectives designed to .Tieet these needs. The procedures for context evalua- 
tion include: 1) defining and describing the environment in which the change 
is to occur; 2) identifying unmet needs and necessary and available 
resources; 3) identifying sources of problems or deficiencies in meeting these 
needs; and 4) predicting future deficiencies by considering the desirable, ex- 
pected, possible, hnd probable outcomes. In other words, context evalua 
tion provides the rationale for justifying a particular type of program. 

Context evaluation, according to Stufflebeam (90), addresses these ques- 
tions: 

1. What unmet needs exist in the context -served by a pai;ticular institu- 
tion? 

2. What objectives should be pursued in order "to meet these needs? 

3. What objectives will receive support from the comniunity? 

4. Which set ofobjectives is moslfeasible to achieve? 

- Unmet needs can be determined by examining the goals of the school and 
•,tudents* performance, comparing-them, and-noting-any "discrepancies." 
The differences represent unmet needs. Which objectives should be pursued 
in order to meet tljese needs depends on the conditions that account for the, 
differences. Stufflebeam suggests that literature published by other evalua- 
tore wrio "Iiswe experienced similar-iproblems may help to explain why 
'Students failed to reach desired criterion levels. Which objectives will be 



supported by the community can be determined simply by polling or inter- 
viewing representatives of commlinity groups. Determining which objec- 
tives are most feasible involves estimates of costs and of resources available 
to the school and community. 

The p.wrpose of input evaluation is to determine how to use the resources 
in order to.meet the goals established for the program. The end product of 
input evaluation is an analysis of alternative procedural designs or strategies 
-in terms oftheirpotential costs and benefits. ^ . 

Stuffiebeam (90) suggests five questions that input evaluation should be 
capable of answering: 

.1, JDoes a- given project strategy provide a logical response to a set of 
^ specified objectives? 

2. Is a givpn strategy legal? 

3. What strategies already exist with potential relevance for meeting pre- 
viously established objectives? 

4. What specific procedures and time schedules will be needed to imple- 
ment a given strategy? 

5. What are the operating characteristics and effects of competing 
strategies under pilot conditions? * « 

Decisions based upon information collected in input evaluatipns typically 
result in the Specification^f materials, procedures, time schedules, facilities, 
staffinp^ and budgets that will be necessary to promote attainment of a 
'.particular set of objectives. '« 

Process evaluation provides continuing, periodic^eedback to program 
managers on how the project is progressing' once it has been initiated. The* 
objective of process evaluation ib to detect'%*efects in the design or its imple- 
mentation and to monitor the various aspects of the project so that pptential 
-ifrcblems or/sources of failure can be identified and remedied. As in forma- 
tive evaluation, the process evaluator collects information fijequently and 
reports it to the program manager ^s often as necessary to keep the project ^ 
•progfessing a^ planned 

Stuffiebeam (90) ^'ggests the/ollowing questions tp_ he addressed by 
..process evaluations 



h6dule? ' ' ^ 



1 1. Is the project 'on schedule? 

^ 2. sliould the staff be retrained or reoriented prior to completionj9f the* 
present-'^roject cycle? * 

3. Are the- facilities and materials being used adequately and appro- 
pri^ely? 



4. What jnsyor .procedural barriers need to be overcome during the 
^I5resent cycje? * ^ 

In* addition to providing feedback for ongoing program improvement, 
process evaluation yields a record or diary of the project which itself can 
prove valuable once the project has been completed. 

Finally, product (or outcome) evaluation measures and jnterpretsjittain- 
ments at the end of a program and at appropriate cut-off pointS:*within it. 
Product evaluation includes: 1) identifying congruencies and discrepancies 
between the intended objectives and actual^atlainnients; 2) identifying 
unintended results, desirable or otherwise; 3) providing for objectives that 
have not been met by recycling the program; and 4) providing information 
for decision makers regarding the future of the program— whether it should 
be continued, terminated, modified, or refocused. 

Despite the labyrinthian intricacy of the model and the perhaps needlessly 
complex terminology, the CIPP modeLhas been used extensively to guide 
program evaluations throughout the field of education (18, 28, 39). It was one 
^f the first full-scale models that directed attention to the information needs 
of decision makers. The CIPP model ma^e evaluators aware of both the va- 
riety and range of evaluative information that is necessarily a part of the dif- ^ 
ferent types of decisions that have to be made in education and the different 
settings in which those decisions have to be made. ^ 

In later works, Stufflebeam (88, 89) distinguished between evaluatiori'fof 
decision making and evaluation for accountability.lE valuation conducted for 
thp purpose of decision making is proactive— similar in concept and practice 
to formative evaluation. Evaluation for the purpose of accountability is 
retroactive in nature and serves a ^ummative role. Actually, all four type^ of 
evaluations^context, input, process, and product— can be considered 
formative when they provide information for program improvement and 
summative when they provide information for decisions regarding a'~ 
program's future. 

Dressell (21) illustrates this quartet within the context of the four cor- 
. responding parts of an educationaKprogram— input, environment, process, 
_and.CJUtput. Gontexl evaluation contributes to decisions regarding the envi- 
ronment, but it is also concerned with the interrelations of all of the program 
parts. Input evaluation is concerned vith clarifying goals a^id assessing the 
use of resources. Process evaluation corresponds tathe process elements, 
^analyzed in terms of their contribution to the attainment of objectives. 
Output evaluation determines the discrepancy between intent and reality 
and analyzes the factors contributing to the differences. 

Although Cuba and Stufflebeam do not provide a set of designs to accom- 
pany the four types of evaluation their model accommodates, they do offer a 
checklist of procedures for developing a design applicable to any of the four 
types. The checklist consists of six major steps: 1) focusing the evaluation. 
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which means identifying and defining the decLsi m situations or the goals of 
the evaluation, the setting ^Mthin^hich it is to be conducted, and the 
policies vvitKiij which it is toj)perate; 2) planning the data collection; 3) plari- 
* ning tSie organization of^the data; 4) planning the data analysis; 5) specifying 
audiences, forpxats, ineans, and schedules for reporting the findings; and 6) 

* admii^trating^he evaluation, or prpviding^n overall plao for ejwccuting the 
^evaluaTion design. Dresser(21) offers a more comprehensive and useful 

^ checklist for plaiinirig an evaluation.* r. ' ^ ^ 

* A. What is the purpose and backgroundof the evaluation? 

1. What inputs, environmental factors, processes, or outcomes are to be 
evaluated? 

2. What are the critical points at which evidence will be required for d<^- 
^ cisions? 

• <p 

3. What rules, procedures, assumptions, and principles»are involved in 
the decisions? *c ,7 

4. Who will make decisions and what is the process by which thcise.will 
be made? ' , 

5. DoQs theoverall situation suggest, require, or prohibit certain tactics 
and strategies? 

6! What timing considerations are involved? 

7. What are the limitations on costs? ' 

8. What are the specific evaluation tasks? 
V B. What information is to be collected? 

1. Are the particular items unambiguously defined and collectiblet)Xo]>- 
jective and reliable means? 

2. From where or from whom is the evidence to be collected? 
X By whom is it to be collected? 

4. What instruments or procedures are to be used? * ^ ^ 

^ 5. Will the collection of evidence in itself seriously affect the input, envi- 
ronnient, process, or outcomes? 

6. Will the collection of evidence become; a regular part of the process, 

or is i| an add-on for a one-time evaluation 
«K • " * • 

7. What is the schedule for collection of information? 



•Paul L. Drciic?, Handbook of Academic Evaluation, o 1976, pp. 23 25. Rcprsnlcd by permission of iosscy- 
B«ss, Inc., San Francisco, Calif. ' - ^ 
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C: 'What procedures will be u&ed for organizing and analyzing data? 
1; In what form is information to be collected? 

2. Will-codirig be required? If subjective judgments will be, required in 
coding, are the criteria for these adequate? Who will do the coding? 

^ 3, How will the data be stored, retrieved, and processed? 

4; Whatanalytiaproceduresaretobeused? 

D. Is the reporting procedure clear? 

J, Who will r^eceive reports-? 

' 2l >>Will reports be^organized by analytic procedures, by type of data, or 
by decisibns to be made? ^ - 

3. Will reports include the practical implications regarding the various 
.possibje decisions to be made or leave these implications for the 
project staff.or administrators to ascertain? 

4. Is the e/aluator to state explicitly the particular decisions which he 
believes are supported by the evidence? " 

5. When and in what detail are rfeports to be made? 
JE, How is the evaluation to be evaluated? 

_^l,<Who willl^e involved— project staffrthe evaluator, decision-makers, 
-some presumably fnore objective'individual? 

2. What y^the criteria used in this second-level evaluation be—costs, 
program improvement, impact on further planning of related enter- 

, prises? * 

3. To whom and when is this report to be presented? 

4. What decisions are to be anticipated as a result of the report? Will 
thpy include improvement of evaluation processes in the future? 

It should be noted that Dressel suggests an additional step not included by 
Guba and Stufflebeam— an evaluation of the evaluation— asserting that 
evaiuators must assume at least partial responsibility for unsuccessful 
evaluations. This point will be discussed further in the concluding secfion of 
thismonograpli. - ^ 

The^SEModel - > 

The final model that we will discuss is the decision-oriented' model 
"developed at UQLA's Center for the Study of Evaluation (CSE) and 



described by its former .director Marvin Alkin. The foundation for theynodel 
is Alkih*s (2) definition of evaluation: 

Evaluation is the process of ascertaining the decision areas of concern, se- 
lecting-appropriate information, and collecting and analyzing informationjn 
order to report summary data useful to decision*makers in selecting among 
alternatives. 

Because the definition, as well ^s the assumptions on which it is based, are 
closely tied to the decision-making process, evaluations are classified ac- 
cording to five decision jcategories and the kinds of information required for 
making the decisions. Alkin refers to these as evaluation need areas. 

The first need area is called systems assessment and refers to evaluations 
that arc necessary to provide information about the current status of the 

' system. The difference between wfiat is and what is desired represents a 
need* and results in a statement of objectives written in terms of desired 
program outcomes. The second area, program planning, refers to informa- 
tion that will help the decision maker select a particular program that is 
likely to be effective in meeting the specified needs identified in the first 
stage. The function of the evaluator is to provide information concerning the 
potential effectiveness of different courses of action so that decision makers 
can choose the best from among the alternatives presented. 

Once the program has been selected (or designed), an evaluation of 
program implementation provides information concerning the extent to 
which the program is being carried out in the way it was intended and in- 
For mation showing whether or not it is being pfo^vided to the group for which 
it was intended in the program plan. Program improvement, a fourth need 
area similar to formative evaluation, requires evaluative information con- 
cerning the manner in which the program ia functioning— the attainment of 
en route objectives, the^presence of unanticipated outcomes, and the rela- 
tive suc cess of the dif CerenLparts of the program. Information collected in 
thfs stage~should include data on the extent to which the program is achiev- 
ing its intended objectives and information concerning the impact of the 
program on other processes and programs. 
The fifth and final area of the CSE model is program certification. Similar 

* in concept to summative evaluation, the evaluatbr's function is to provide 
information concernitjg the worth of the overall program, again in terms of 
.both the extent to which the objectives have been attained and the program's 
impact on the outcomes of other programs. The information collected by the 
evaluator at this stage should enable the decision maker to make decisions 
regarding the future 'of the program. As in the CIPP model, the decision 
maker has four choices: to retain the program as is, modify it, disseminate it 
or terminate it. 

Stages two through five are similar to the first four stages of the Dis- 
crepancy Model, and the first two and the fifth stages are similar to theCiPP 
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n\oders context, input, and product evaluations. Process, as defined in the 
CIPP model, has been separated into program implementation and program* 
improvement, and as far as Alkin is concerned, cost-benefit analysis,- the 
fourth stage of-the Discrepancy Model, is assumed to be part of every stage 
in his model. 

The advantage of the CSE Model is that it is applicable to the evaluation of 
.both discrete, definable instructional programs and broad-sqale educational 
systems. In fact, Alkin argues that evaluations at the macro level oif large 
educational systems require total examination beyond determining the 
extent to which program objectives have been achieved. For large-scale 
evaluations, the examination must include inputs, descriptions of alternative 
processes used within the sys* m, descriptions of the input-output relation- 
ship and data on 'unanticipated outcomes or consequences in addition to data 
on the achievement of intended or desired objectives. Unfortunately, 
, AJkin's advice has not often been heeded. 

Some New Approaches 

Although not exactly models in the strictest sense of the word, the Modus 
Operandi Method and the Adversary Approach to evaluation must be men- 
tioned, even if briefly, since they will both no doubt receive greater attention 
in the near future. 

Th? Modus Operandi (MO) Method is suggested by Scriven (74) as an al- 
ternative when experimental or quasi-experimental designs cannot be used. 
The theoretical base of the MO method, whicji derives from procedures em- 
ployed by historians, detectives, anthropologists, and engineering **trou- 
bleshooters," is really quite simple. A program is investigated to see if it was 
the cause of a certain set of effects. As Scriven explains, **the MO o/a 
particular cause i^ an associated, configuration of events, processes, or 
properties, usually in time sequences, >Vhich can often be described as the 
characteristic causal chain (or certain distinctive features of this chain) con- 
necting the cause with the effect." ^ ' 

Certain effects are assum^ to be caused by bnc or more factors, Which 
Scriven calls a **quasi-exhaustive causal list." The presence of each of these 
factors is checked, and if only o^ is present, the investigator checks for a 
**causal chain"— the configuration of characteristic events, processes, or 
properties that may connect the cause with the effect. If one causai chain is 
present, that chain (not the butler) is the cause. If more than one complete 
chain is present, the possible causes associated with it are considered co- 
causes. 

Althcrigh Scriven suggests using the MO method in situations where 
classical design cannot be used, he also argues that even in experimental 
studies som^ attention should be given to the questions implicit in the MO 
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lapproach: **Whatarc the means whereby the-putative cause is supposed to 
h6 bringing about the effect? What are the links in the causal chmn between 
them? Can we look for these links or arrange that they will be easy to look 
Xor? .Can wc use their occurrence to- distinguish between the alternative 
causal hypotheses? How?' ' • 

The MO method is still in a theoretical stage and has not been tested in 
actual evaluation practice. However, it offers evaluators a logical alternative 
to employ in appropriate situations, and in line with Scriven's other 
contributions, could ultimately prove useful. 

The Adversary Approach offers less promise, at least according to some 
who have used it in p^ctice — for example, Popham and Carlson (62). First 
suggested 'by Cuba (3.)), the Adversary Model ^derives its origins from the 
legal mode^of advocate/adversary conflict, and confrontation and third- 
party 'resolution. Although there are several variations in the actual way it is 
applied to evaluation (and the reader is urged to consult the several descrip- 
tionL of the approach),^ Adversarial Evaluation basically" involves two 
separate evaluation teams (or individuals)— one chosen to represent the 
program in question and gather evidence in its favor; the other to represent a , 
-competing program, or, in the absence of a competing program, to gaffier 
evidence and present a case against the program. The results of the two 
evaluations are presented either ih.written reports or in a traditional debate 
setting, with the decision itmfters rendering the final verdict. 

Jn theory, the Adversary Model seems to be an. ideal way in which to be 
assured of a truly objective evaluation, and its champions extoll this virtue. 
But, accord ing to Popham and Carlson (62), the model, has several serious 
^fects: it independent upon the two competing evaluation team^ havings 
equal skills and on the commitment and fairness of the **judges;" there is no 
adversary court of appeals-to which an improper ruling can be protested; it is 
expensive; and lastly, most educational decisions are not amenable to the bi- 
nary choice of a winner/loser pr go/no-go adversary contest. Educational de- 
cision makers need many more options concerning the future of a program 
than just those of maintenance or termination. The ultimate fate of the Ad- 
versary Model will have to await more reports of its use in actual evalua- 
tions. Perhaps when guidelines for its use are refined, some of the 
deficiencies encountered by Popham and Carlson will be remedied. ^ 

Citations ' 
Countenaace Model— Stake (83) 

Differential Evaluation Model— Tripodi, Fellin, and Epstein (94> 

Priority Decision Model— Boyle (9) 

Trade-off and Comparative Cost Model— Glass (30) 



»Scc Guttcntag, M. (38); Kourilsky, M. (46); Lcvinc, M. (47); Owens, T. (56); Wolf, R. L. (109); and Wolf, 
Potterand Baxter (110). 
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Systems Approach.MuaeK- Yost and Monnin (1 12) 
< Cost Utility.Model (Costa, 1973) 

dntological Models-^Peper (58) 
Synergistic Models— Hunter and Schooley (43) 
^ Ethnographic Models— Dobbert and Dobbert (20), Wilson et ai ( 108) 
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EVALUATION DESIGNS 

The evaluation models described in the previous section represent the major'"' 
paradigms of educational program evaluation; they have been used to guide 
many evaluations and they have influenced the thinking of many practicing 
e valuators. Models provide a broad base for designing evaluation activiUes 
by offering a framework and conceptualization that guides both the fodus.of 
the evaluator and the orientation of the evaluation. But models do not 
provide strategies for Implementation. **Although models may help the 
evaluator isolate the types' of decisions, to be made, they do not proyide 
procedural guidelines regarding how those decisions should be made.'' (60) 
y Guidelines are provided by the design, v/hich establishes the conditions and 
procedures for collecting the data required to answer the questions of 
,^ concern. The design must be related to the type of program or service being 
' evaluated; that is, \he selection of a particular design is guided by the deci- 
sions that will have to be made as a consequence of the data. In turn, the 
adequacy of a particular design can be deterniined by the extent to which the 
results may be int erpreted and , the questions answered. In most cases, 

evaluation designs have been borrowed ffCTm research. 

For example, Campbell and Stanley (12) distinguish between three types 
of- research designs commonly used in evaluation— pre-experimental, ex- 
perimental, and quasi-experimental— evaluating a number of specific 
^ designs in each category according to their ability to withstand threats to 
their validity. That is, the criterion differentiating the three groups of 
• designs, as well as the quality of the designs within each group, is the extent 
to which the design protects against the effects of extraneous or nonprogram 
variables, thus legitimizing the results that are attributable to the program. 
More specifically, the criterion is the extent to which the design protects 
against eight threats to internal validity^— eight kinds of variables, ex- 
traneous to the program, that if not controlled, will affect the outcomes of 

«> 

*ampbell and Stanley also describe threats to external validity that jeopardize the generalizability of the 
findings. Although some wnters argue that generalizability is (or should be) an important consideration in 
^ program evaluation, mos^othcrs feel as we do. that geneNizability is not a m^Jor concern ih most educa- 
Uonal program evaluations. For a descnpUon of threats to external validity . the reader is referred to Camp- 
> bell and Stanley (12). 
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the program and thus the accuracy of the interpretations that can be made of 
the data. , ^ ^ 

The eight threats to internal validity are as follows: 

History: Outside events, such as changes in factors like the job market, 
the economy, or television programming, can affect the subjects of a 
program and thus the program results. Outside events are likely to occur 
when.the program being evaluated extends over a long period of time. 

Maturation: Processes within respondents, such as fatigue or growth, 
produce change as a function of the passage of time. Natural growth alone 
may sometimes be responsible for changes that are observed in a program 
evaluation. Weiss (99) describes the problems confronted in evaluations of 
delinqueijcy prevention programs that do not have control groups. Be- 
cause young males generally become lesg. likely .to commit crimes and 
more likely to hold jobs around-the age of 17 or 18, when such results ap- 
pear in program evaluations, they cannot be attributed to the prevention 
program unless a control group has beenlncorporated in the design. 

Testing: The effect of a test on the scores of a second test,, as in the 
pretest-posttest design, prevents a true determination of the program 
results. . . K o 

Instrumentation: Changes in the instruments themselves, in calibration or 
difficulty level, or changes in the observers or scorers used affect the ac- 
curacy of interpretations. 

Selection: Biases resulting from the differential recruitment of the experi- 
mental and control groups affect the accuracy of interpretations. 



Statistical Regression: Nori-program. effects can appear during statistical 
maniRulatio.ns. When groups are selected for a study on the basis of 
extremely high (or, more often, low) scores, their scores on subsequent 
tests will tend to regress statistically— that is, move back toward the mean 
of the group-. The regres.sibn is an artifact of the statistics and not an effect 
of the program. 

Selection-Maturation Interaction: Selection biases result in differential 
i^tes of maturation or changes as a function of time. 

True experimental designs protect against all of these possible threats to 
internal validity; quasi-experimental designs generally protect against most 
of them. Quasi-experimental designs require the same rigor, but they are 
more practical than the true experimental model in many real-worid situa- 
tions. Pre-experimental designs totally lack control and, according to Camp- 
bell and Stanley are "of almost no scientific value." Examples of pre-experi- 
mental designs are: 1) the one-group, pretest-posttest design. in which a 
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,„^. group i..pr«.s.ed expo«d^^^^^ 

iith a group that has not-a. 'onipr on Mt ,s ™ J ,„ 

of case^studies in a subsequent sectiori of this chapter. 

Quasi-experimentarDesigns 

BecausJof the difficulty of conducdng true exper^^^^^^^^ 

educ^on. quasi-experimental ^esi^s have b come m e 
, both research and evaluation ^'^^i'^'''^''^^^^^^ The 

designs gained respect.under C^-^P^/" ^"/..^^^^^^^ 
^ designs described on the f^"^"^ P.^^^^^^^^^ features that 

qu^i-experimental group.^^^^^^^^ a more 

S^st'fvS^^^^^^^^^^ 

and Stanley (12). * - i . . p-^bablv the most^ commonly 

or Intact groups whose chara«n,sy« are '^^'■^^,^^^„'^^^"^t, groups 
are used as controls. Pretest andWa«^^^^ 

rrerpsr::nxsrg£^^^^^^^^ 

re^L-^ireS^rSeT^^ 

te/t analysis. Weiss proposes usmbuMWara^^^ 
^rjL«hg I dafe retti ^^l the ..e series 
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design has many advantages to offer. A series of measurements are taken of 
the participants before, during, and after the onset of a program, with the 
before measures establishing a baseline performance level against which to 
measure changes. The measures are examine^ to determine an '^effect pat- 
tern" or trend to show the impact of the program over time. 

The multiple time series design provides more, rigor by adding an addi- 
tional group and examining the series of measurements for both groups. If 
the program evaluated has been effective, the effect pattern for the two 
groups should be markedly different. A m^or advantage of tl\e time series 
design is that it is a fairly powerful design, providing excellehllnformation 
on the effects of a program even when a comparison or control group cannot 
be used. Time series designs are particularly well suited for longitudinal 
evaluations and social actlon'evaluations wherp the program cannot be with- 
held from appropriate participants. 



Experimental Designs 

Although some writers acknowledge the difficulty of applying , jntrolled ex- 
periments to the problems of education, an^ more thaij a few add the caveat 
of **where conditions allovy," experimental design is to many educators the 
cornerstone of evaluation— the ideal methodology for educational program 
evaluation.'^ Campbell and Stanley ( 12) state unequivocally that they are 

. . .committed to the experiment: as the only means for settlingdisputes re- 
garding educational practices, as the only way of verifying educational 
improvements, and as the only way of establishing a cumulative tradition in 
which improvements can be introduced without the danger of a faddish dis- 
card of old wisdom in favor of inferior novelties. 

^ Classic experimental design incorporates two important techniques that 
together rule out the possibility that something other than the program 
caused the observed results, and thus, they confirm the legitimacy of the in- 
terpretations made from the data. These techniques are the use of control or 
comparison groups and randomization. Quite simply, this means that sam- 
pfes of the target population are randomly selected and assigned to either the 
experimental group receiving the treatment (program) or the control groups 
which receives a different treatment or no treatment. Members of the two 
groups are posttested after the program has been completed, the differences 
are compared, and the experimental program is pronounced a success if the 



'Sec Aronson and Sherwood (4). Campbell (10), Evans (26j. Glennan (32), Houston (42), Popham (60), Porter 
(63); Rossi (72); Scriven (74, 76, 77), Stanley (86, 87), Welch and Walberg (102). Wholey et al. (107), and 
Weiss (98, 99). Evans (26) makes a compelling argument in favor of small-scale controlled expcnments>to test 
the relative effectiveness of alternative program techniques as a precursor to ihe introduction of massive na- 
tional programs. 

^ , 29 
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experimental group has more of whatever the criterion vanable is than ^he 
members of the control- group, That the experimental group had fewer 
cavities after using Grest should by now be a familiar slogan. . 

Thfe essential feature in experimental designs is randomization, which 
increases- the probability that subjects who form the control group are 
basically equivalent to those in the experimental group. In the Crest experi- 
ment, this meant that the people- who formed the contror group and used 
Brand X were, as far as the experimenc was concerned, no different from the 
people in the experimental group using Crest-at^ least not until they com- 
pleted the program. Controlled experiments reduce the possibility that 
something-other than the program caused the results. Suppose, for example, 
that subjects were not randomly assigned to the Crest and Brand X groups 
and it turned out that the subjects in the Crest group lived in a community 
that introduced fluoridation into the water soon after the study had begun. 
^Suppose, at the same time, that the msyority of Brand X subjects lived in a 
community that did not have fluoridated water. Quite obviously, the in- 
ference that the continued use of Crest results in fewer cavities would have 
been suspect, and Arthur O'Connell would Uvc been out of a job. 

Without question, experimental design can be a powerful tool. If people 
can be randomly assigned and if there are enough of them available to forjn 
• an experimental and a control group; if the control group will not be harmed 
or deprived psychologically, socially, or financially by not receiving the 
orogram or by receiving : placebo program; if the program is a specific, 
definable-entitj; and if e objectives are explicit, then an experimental 
design is probaBly the best choice. If the evaluation proceeds smoothly and 
if the instruments and measures are valid and reliable and appropriate to the 
objectives, then, if the experimental group shows greater f ^/^^ange 
than-the controls, we can be fairiy certain that the change is due to the etiect 

< ■■'^Kr'ograms do not exist in apolitical or ideal contexts and compromises 
in design are inevitable. There are innumerable occasions when forming con- 
trol groups and randomization are difficult; there are many situations in 
which it is impossible. Sometimes programs have to be offered to intact 
groups, such as classrooms already formed accord.ng to school schedules 
, Soriietrmes groups available for comparison are too dissimilar Greenbert 
' (33) and Weiss (98, 99) both comment on the pfroblems associated with find- 
ing truly equivalent groups or communities where randomization has been 
possible and note that the alternative usually used, that of matching, is not a 
satisfactory solution. For every factor on which groups are matched, there 
are other equally, if not more important, variables on which they are un- 
matched. It is these variables that may in fact exert more influence on the 
outcomes than the variables on which the groups are supposedly matched. 

In otiiersituations, programs must be provided on a voluntary basts and 
made availjible to all who apply. This is particularly true in the caseof social 
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action programs whose primary purpose is to shift the position of a specified 
•targ;et group relative to the rest of society. Few administrators, or program 
evaluators for tha^ matter, would-be willing to deprive people of programs 
that would be of benefit to them,„As.Suchman:(92) comments, it is difficult 
-both^to-refuse service to those, who seek it and to force it upon those who 
diin't want it. , . - 

But, even when control groups are feasible, there are a number o^rob- 
lems.that interfere with the operations of an experimental design. First, ex- 
perimental designs are particularly vulnerable to **Hawthorne effects/;* 
Regardless of randomization, the results of a .program can become 
cdnianiiriated if either the Experimental or the control group find out that 
they are participating in a **study" and become aware of their special status. 
Experimental participants may try harder while their control group counT 
terparts.may become annoyed or angry at being rejected by the program. 
The change in their actions or attitudes will affect the outcomes of the 
prqgramvin addition, it is difficult to maintain contact with controls who are 
not receiving an alternative or placebo program. 

An added problem concerns contamination of the control group. Mann 
(51) observed that in an organizational setting, innovations sometime 
**spread like a disease" to control groups.^Rossi (72) notes in addition that a 
* changing economic or political climate can make available, to the controls 
programs or services tliat are essentially equivalent in many respects to the 
program or services being evaluated. It is far easier to implement a rigid 
evaluation in programs operating in highly centralized organizations such as 
prisons, hospitals, or boarding schools in ^hich the organization maintains 
strict control over its members and the evaluator can thus maintain strict 
control over the design. 

Still, as Weiss (99) cleariy points out, ingenious adaptations can be made 
to alleviate, and in many cases eliminate, most of the problems that beset ex- 
, peri mental design. Scriven (76) suggests the use of multiple experimental 
groups to separate Hawthonie effects from those of the programs. Weiss and 
others (12, 24. 44, ^0) suggest the time series design in which the treatment 
group becomes its own control through repeated measures of outcome vari- 
ab]es or in which two different programs are compared and t}ie treatment 
group of one program serves as the control group for the other and vice 
versa. Rdssi (72) proposes a two-stage evaluation consisting of a.rec.onnai- 
sance phase in which non:experimental designs are used to screen out pro- 
grams that should (and c^n) by investigated further c:nd an experimental 
pHase in which powerful controlled experiments are used to evaluate the dif- 
ferential effectiveness of a variety of programs that demonstrated sizable ef- 



•The term refers lo a stn'es of studies made at the Hawthorne Works of the Western Electric Company 
between 1927 and 1932. Researchers found that workers increased production whenever they became the 
subject of attention in a study. The ''Hawthorne efl'ect" has subsequently been found in many research and 
evaluation situations where experimental designs have been used. 
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fects in the first phase. - , u „ ^ 

The experiiiicntal model has been challenged not only because of the 
inherent difficulties in-using such designs but also because in many instances 
experimental designs are counter-productive to the needs and goals of the 
evaluation. A? we pointed out earlier, the design must be suited to the pur- 
poses of the evaluation. If the purpose of an evaluation is to find out how 
well a particular program achieved its goals, an experimental design is ideal 
> if decision makers are concerned with program implementation, participant 
satisfaction, or information for program improvement, other designs are tar 
more appropriate. In these examples, experimental design wou.i be inade- 
quate for the task. ... , u- u 
The many limitations of experimental design, particularly those which 
focus on the extent to which a program has achieved its objectives, are well 
documented and will not be reiterated here. For more detailed discussions, 
the reader is referred to Borich and Drezek (8); Cuba (34); Riecken (66); , 
Rose and Nyre (71); Stake (84); and Wergin (103). 
Most studies carried out under experimental, conditions fail to assess the 
/ impact of the program operating within functioning institutional or or^aniza- 
' tional systems. The-focus on objectives limits the evaluator's understanding - 
of the program and, despite Scriven's exhortations, attention is seldoni paid 
to the merit of the goals established for the program or to unanticipated out- 
comes .that may have far more important consequences than the goals ongi- 
nally intended. An obvious example is a math program th^t significantly 
improves children's understanding of mathematics but results also m their 
hating math! Experimental designs do not take into account changes in goals ^ 
(or procedures) that frequently take place once a program is underway, and * 
they cannot provide the immediate formative feedback that programs often 
need in order to identify and correct snags in their early stages of imple- 
mentation. - „ , . , • ,1! ♦ tk« 
House (41) offers-an interesting analysis of the problem, arguing that the 
classical approach to program evaluation, in which learner performance is 
measured on standardized tests of achievement (which implies that the 
-larger the gain, the better the program), is based on utilitarian ethics. Utili- 
tarian ethics stipulate that a society is just when its institutions are arranged 
so as to achieve the greatest net balance of satisfaction as summed overall 
individuals. The. principle of utility is to maximize the net balance of satisfac- 
tion Thus, a common measure or index of the criterion is required so that 
quantitative calculations can be made. In education, that measure is the 
standardized test, and in the classic evaluation approach, the best educa- 
tional programs are those xyhich produce the greatest gains in test scores 
regardless of the disiribution of those scores. Only the final, net score 
, counts, and, ^ince it is averaged across all individuals, one person s loss is 
' balanced by another person's gain. The real effect of the program on dif- 
ferent subsets of individuals is masked. 
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Most experimental designs that have been used in educational evaluation 
•fail to consider the manner in which the program was implemented or the 
configuration of people, events, processes and practices, values and at- 
titudes that surround the program, affecting the environment in which it 
operates imd thus, at least presumably, its outcomes. It is mi enough to 
document !^hit a program failed to wofjt. It is essential to. identify the 
processeswiid other, variables that combined to defeat it. Particularly in the 
case of large social action programs, but even with small-scale educational 
programsr,*the investigation of negative effects is an important issue. The ca- 
pacity of cojnmuiiities, organizations, institutions (and people!) to resist 
change must be investigated "^and the factors that defeated a program 
identified so that they can be used' as a base.for the design of a-program that 
is more likely to be effective. 

' Conversely, it is not enough to dpcumeiit that a program achieved its goals 
and the extent that it dic^so. Equally important as the attainment of goalsjs 
the concern with M^hy the results occurred, what processes intervened 
between input and outcome, how the program actually operated, what non- 
program events may have affected participation, and what implications and 
guidelines can be derived from the evaluation for program improvement and 
. replication. Experimental design alone cannot provide this jessential in- 
formation. . * 

Weiss and Rein (101) point out that in broad-aim programs, different ap- 
proaches are often used at the local level so that the programs in effect differ 
, from community to community. A description of the different forms and ap- 
proaches as well as the forces that shaped each would be important informa- 
tion that cannot be obtained through traditional experimental evaluation. 

Stufflebeam (89) contends that experimental designs are only appropriate 
in product evaluations and, thus, are of minor relevance to educational 
evaluation. Cuba (34) goes further, stating that experimental design actually 
**pfevents rather than promotes changes" because the pjograms cannot be 
altered ifHhe data and interpretations about the differences between them 
are Co be unequivocal. 

The same criticism.s and shortcomings can be leveled against quasi-experi- 
mental designs in which the usual thrust of the study is also the degree^to 
which desired goals have been attained. No matter how effective and useful 
they axp in some situations, again, little attention is paid to how the program 
Redeveloped, what unanticipated consequences occurred, what variations 
exist among the program*s component parts or units, what outside events af- 
fected either programming or participants, or to the adequacy of the program 
.operation and the capability of the staff. As Stake (83) suggests, most 
classical designs were developed as a means of examining ''minute details**; 
they -were not developed for portraying the ''whole cloth of the program'*. 
The point is, evaluation designs must accommodate the characteristics and 
informational needs of the program, not the other way around. 
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IVocess Evalualion— the other Extreme • 

Unfortunately, the very real problems with experimental designs and the 
deficiencies of quantitatively oriented evaluations that reached their height 
in the era of accountability precipitated a reactionary movement to the other 
extreme— an equally deficient process-oriented approach, alternately refer- 
red to in the literature as transaction-observation, process-oriented, qualita- , 
tive, or illuranative ^valuation.' These approaches, v/hich derive primarily , 
•from Stake's^ countenance m.odel and l.fe later ^'responsive" evaluation, 
•focus almost exclusively on the environir^pnt or **milieu," eschewing quanti- 
tative output measures, and are preoccupied with program process .^Nonex- 
perimentai designs (pre-cxperimental in Campbell and Stanley's terms), 
\vhich were previously considered to be of little or no value to educational 
evaluation— at most, a last resort— have suddenly come to be the method of 
choice (49, 57, 79, 82). Most popular is the case study, in which the evalua- 
tors '^observe, inquire further and then seek to explain" (57). The data base 
relies heavily on interviews and observations, often informal. The ev^luator 
documents and describes what it is like to,participate in the program, how 
participants feel about the program and the staff, how the staff feels about 
the program and the participants, and what both parties believe to be the 
significant features of the program. Surrounding elements of the organiza- 
tion and environment are investigated ancl their relationship to the program 
is explored. Anecdotes are collected and program documents are reviewed; 
But the whole issue of program outcomes— the consequence^ of a 
program— is totally ignored. . • 

A goal-attainment model that excludes process data can only address the 
issue of what has happened, U cannot respond to the broader quesiton of 
what was responsible for which outcome: Even more important, it cannot 
provide information for progc^am improvement and development. The 
process-focused approach, wlych.excludes outcome data, cannot deal with 

either question. ^ _, , . 

. In their extensive critical review of federally-sponsored evaluations, 
Bernstein. and Freeman (6) comment pn a study whose data analysis tech- 
niques included reviews of narrative descriptive reports and imp^ressionistic 
summaries obtained by means of the case-study approach as follows: 



•Although Parlctt and HamTiton have populanrcd the term illuminative evaluation, credit for coiniM the 
phrase and suggesUng the methodology and issues pf;conccrn belongs to Martin Trow, y^ho spoke of the 
need for illuminative evaluation in 1970. C " * 

•Process as used here is a broader concept than the traditional one where process evaluation means to de- 
termine whether or not a panicular program was implemented according to its plan and directed at the appro; 
nriate specified target popuKUon. As used here, program protf ss refers to the resources and forces external 
to the program thai may affect its operations and incMer. iriaddition to the above, an investigation of ot^cr 
programs and components within the institution and the needs, resources, and attitudes of the larger com- 
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Wc cannot avoid noting thai this study indicated . .*.Hh?.t no measures of 
outcome were taken at all. Barring some very unusual circumstances, we 
would conclude that this study is illustrative of an evaluation which did not 
meet the basic requirements necessary to be classified as competent evaJua- 
tion. 

Sadly/this approach i^particularly appealing to the fainthearted. Because 
it typically eschews making jlidgments about the tvorth of a program 
(probably a wise decision in view of its lack of rigor), this approach is ob- 
viously .tempting for those who wjsh to AVpid^the risk of finding their pro-. 
grams*impo{erit.^^Air.they have to do is ask participants how they felt^about a 
program, chronicle how tlie administrators and the staff felt, describe the in- 
stitution- and the pro^ir^m, write up an interesting narrative report, and ig 
nore the fact that 410 matter how richly evocative or interesting the report, 
the findings ma' A^ell be distorted and untrustworthy. 

Using somewhat different terms, Scriven (76) differentiates,between ap- 
proaches to educati6nal evaluation in which the emphasis is on intrinsic cri- 
teria and a|)proaches in which ^he chief attention is given 'to extrinsic cri- 
teria, intrinsic criteria refer to the constitution, nature,, o'r essence— the 
qaalitit.s inherent in the subject ci* evaluation — and are associated with its 
pro^cess. Extrinsic criteria are concerned with the effects of tjie program. 
Both Scriven and Popham (60) argue that the emphasis on intrinsic criteria ii 
all too common in educational evaluation, and that most such studies are too 
'h^^h^*xrd to be properly considered, systematic evaluations. 

Case study evaluations are seriously defectiv'2 in a number of wayS. At 
best, they are vulnerable to the threats of history, maturation, selection, and 
mortality. Because there is no design directing the data collection or 
guidelines that establish parameters, case studies accumulate a huge bulk of 
data, much of which isl. relevant and all of which is difficult to organize (lOf, 
103). And, of course, there aie no baseline measurements with which to de- 
termine change or growth. But far more serious are the problems of bias and 
Subjectivity that are endemic to the case-study approach. 

Case studies, operate within relatively sii^all units of analysis, anci assess- 
ing a program by judging only a few unijs exposes the study immediately to 
sampling bias. There is great variation in the ;*eports provided by inter- 
viewees bfecauSe of their biases, and this phenomenon is not eliminated by 
**triangulation.'* A key concept in many case study m.ethodologies, triangu- 
lation is a term borrowed from Webb et al. (97) that refers to viewing the 
problem from a number of angles and representing the perceptions of the 
program by its different publics to ensure a fair evaluation. Areas of 
agreement and conflict are identified and defined as the evaluator attempts to 
find' convergence of findings.from a number of different sources (57). The 
problem is that the perspective of a given '"^public'* depends entirely upon 
which members of that public are interviewed and what they are willing to 
teli. There may well be, in fact, several different perspectives within a 
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particular public, and unless the interviewees are selected randomly, the' 
Story they tell may well represent the biased view of/only a few members or 
at most one factipn of the group. Rieckei^ee) attacks not only the sampling 
•blks in casd studies, but also the lack of\omparability between subjects 
reports, which severely limits statements of the extent to which particular 
effects were produced.. . - . . u 

Case study process evaluations will always be vulnerable to charges of 
Diks either on the part pf the participants or the evaluator. Lucco (48) goes so 
-far as to question the "poliUcal underpinnings" of evaluations in which em- 
phasis is placed on program operations and process. In order to bnng a sem- 
blance of quality control to a case study, the investigator must be 
conscientious, skilled, insightful, and objective. But, even where evaluators 
aro- paragons of brilliance, objectivity, and virtue, their observations are still 
^{de from their personal frame of reference, and subjective bias is impossi- 
ble to avoid. , . '• . <• 
House (41) and Stake (82) both attempt to justify the subjective nature of 
case study evaluations by comparing the procedures to those of an anthro- 
bologist or historian. An anthropologist observes a tribe or village in order to 
describe its culture, the roles and relationships of the members, a^d the way 
in which it functions. Historians describe eyems in orderlo identify patterns 
and causal relationships between events. 'But anthropologists M historians 
are interested in describing and interpreting only; they do not make judg- 
ments nor do they need to make decisions. In education, we need to make 
'decisions, and evaluatmn increases the rationality of these decisions 
Evaluation always hastlieavil'y subjective component because it deals with 
values; but t'hat does not in itself excuse slovenly design or statistic^ 
Analysis (21). The intent should always be to move as far as possible toward 
' objectivity and clarity. Illumination is not evaluation. 

Still there are some situations in which the evaluator simply must use 

limited methodological tools. Certainl, it is better to know how faculty and 
students„or any subjects of a program for that matter, behave.while they are 
under observation than to know nothing at all about how they behave. Weiss 
and Rein (101) suggest that informal approaches usually associated with ex- 
ploratory research, such as the case study, may be appropriate where the 
relatiVe contributions of various components of a large-scale program are 
difficult to determine because of the participants' uncontrolled exposure to 
the program or where it is difficult to select and operationalize evaluative cn- 
teria that are sufficiently broad in scope to reflect the program s full range of 
consequences. They also suggest that qualit?tive appraisals by means ot 
case study can be used to describe the variations in social action' programs 
from community to community in combination with an jjssessment of overall 
program outcomes through experimental design. ' 

In these situations, observational techniques and interviewing can provide 
useful (and rapid) additional feedback. And, as an exploratory analysis, 
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case^study data may^provide the eV^Iuator with suggestive leads concerning 
significant variables that can subseC[uentIy be studied' more rigorously with 
an experimental design (103). Mannj(52) is less sanguine, however, suggest- 
ing that these leads niky be suspect m liglft gf the tremendous bias implicit in 
the case-study approach. \ ^ 

. As^th expcriiiie.ntal;dfesign, of course, the process-oriented^ ,ase-study 
appri(>acii'has its own band of loyal ^followers for whom case study is the 
methbd, enabling' the evaluator to Understand the whole of a program 
throu^ 'direct and vicarious experience. And without question, it is im- 
portant to understand the **who!e** ;of the.program — including the dif- 
ferences i;i perspectives:between program pkiiners andprogram^operators,. 
differences in values and perspectives of different audiences, ways in which 
the program operatfes,pnd other programs, people, events, or combinations 
thereof that may influence the prograni under analysis. Understanding what 
happens withtespect to the political and social forces involved is essential if 
a program is to address the issues or problems effectively. Few professionals 
would deny that ar understanding of prgcess is important. One has only to 
look at the legal pre sion, where the integrity of the process by which one 
is brought to trial dictates the outcome. But, as Weiss (99) argues, critical as 
ft is^tolearn more about the process and dynaniics of a program, it 
•nevertheless equally critical to determine its outcomes. 

Identifying the outcomes of a program is only part of an evaluator*s task. 
Unless an evaluation describes the actual program and the procedures and 
processes that brought about the outcomes, it is, presenting a half-told story. 

But, understanding the process without defining the outcomes is also an 
unfinished storyr A recent **human** iuterest story reported in the Los, 
Angeles Times (Monday, July 4, 1977) provides an amusing illustration of a 
process-<Jnly orientation. The story was about an operation to reset an 
elephant's broken leg in New Delhi. **The operation was successful; the 
operated limb was corrected,*' claimed the team of veterinarians. And they 
went on to describe how an army tank crane, 12-inch steel pins, welding 
equipment, yards of plaster of paris and gallons of antibiotics were used in 
the surgery.. The fact that the elephant died of heart failure caused by 
nervousness and excitement during attempts to get her on her feet during 
postoperative procedures did not stop the veterinarians from stressing the 
success of the surgery and was only peripherally noted. The story headline 
read ''Operation Success but Elephant Dies*M 

Clearly, both the process-only and the outcome-only approaches are 
inadequate for the evaluation of educational and social programs. What is 
needed is a methodology that combines rigorous experimental data with **a 
natural history account of events and actors before, during and after 
program implementation** (5). Integrated evaluation approaches such as the 
ones described in the next chapter may well provide the answer. 
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INTEGRATED APPROACHES TO 
PROGRAM EVALUATION 

The concept of integrated evaluation is not new. As far back as 1963, Cron- 
bach stressed the need for evaluating the interactive events or ^'process" of 
the classroom in.addition to the learning outcomes. Sen ven (76), too, offered 
what he called mediateci evaluations, which combined attention to both in- 
trinsic and extrinsic criteria. Suchman (92) proposed four different kinds of 
evaluation: evaluation of effort, or the amount of action involved in es- 
tablishing a program; evaluation of effects, or the results of the action; 
evaluation of process, the way in which the effects were achieved; and> 
' evaluation of efficiency, the ratio of costs to effects. And, in Stake's 

Countenance Model described earlier, transactions are equivalent to 
process. 

- An integrated evaluation approach is a hybrid of the two polar positions 
described in the last chapter, one that combines the study pf program 
process with the study of outcomes. In this section, we will focus on two 
examples of integrated approaches— holistic evaluation and transactional 
evaluiition. Brief descriptions of these programs will be followed by exam- 
ples of actual evaluations in which these, approaches were used. 

t 

Holistic Evaluation 

Holistic evaluation is an integrated, multidisciplinary approach to program 
evaluation that investigates both process and product (45, 71)'. By broaden- 
ing the paradigm, holistic evaluation enlarges the scope of questions that can 
be asked and the body of data that can be collected. It includes descriptions 
and quantification , objective data and perceptual reports, and it can accom- 
modate experimental designs as well as case studies. Named to convey its 
sense of comprehensiveness— not its **holiness**— holistic evaluation rests 
on six basic assumptions: 

1. Programs do not exist in isolation. Educational and social programs are 
but one component within a broad system or organization in which 
program activities are carried but, 

2. As such, programs receive influences from various people and groups 
with differing needs, interests, and points of view. 

3. Educational and social programs have different meanings and different 
implications for these different groups. 

4. The evaluation of these programs involves gathering information useful 
to the disparate groups of decision makers with direct input into the 
program as well as groups which may not be directly involved io the 
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. prog; am but AVhose decisions may nevertheless affect it. 

5. Procedures ifor carrying out the evaluation n?ust be appropriate to tJ\e 
program and selected to proyide the kinds 0/ intormation that are re- 
quired, by the different groups ofidecision makers. (In other words, the 
dedsiori-needs dictate the methodology of the evaluation.) - 

""^'iS'/TMosr^cision by their nature, require information about both 
^ •program:process and program outcomes. 

•Holistic evaluations are thus concerned with four m^or areas: 1) the 
spcial-psychpibgical environment in which the program operates; 2) at- 
titudes^ Values, interests, ^nd perceptions of participants and;Surrounding 
^oups;: 3) program and participant outcomes; and 4) the interaction of the 
various elements, comprising the system that may affect the operation of the 
program^and thus its outcomes. 

Holistic evaluation is not'a model in the strict sense of the word. It is a 
conceptual framework with certain defined strategies from which program- 
specific (and site-specific) procedures can be derived for either formative or 
sumniative evaluations. Holistic evaluation has been used to evaluate four 
federally funded programs in vocational educati9n (45); a multi-campus 
instructional development program for faculty (70); a statewide program 
operating in three public segments of postsecondary education (69); a car- 
riculac program at a professional school (55); and a statewide program for 
disadvantaged students (27). The last two evaluations will be described jn 
depth in the section of case studies to illustrate the holistic approach. 



Transactional Evaluation 

Transactional evaluation is a term usually credited to Robert M. Rippey, Ac- 
cording to Rippey (67), the actual meaning of the term is still emerging; it is 
not yet fully developed. A synthesis of the writings of several transactional 
evaliiators, however, shows that transactional evaluation has certain at- 
tributes that distinguish it from so-called traditional approaches to evalua- 
tion.^^ 

To begin mtU, transactional evaluation emphasizes a broad base of par- 
ticipation. It iavolves not only the designers and supporters of a program, 
but also a representative sample of antagonists— persons who are likely to 
be affected" adversely by the program or disturbed by the consequences of 
change^ Secondly, transactional evaluation stresses the valueof conflict and 
uses it.as a basis for examining differences in perception among the various 
groups. In transactional evaluation, the key is not consensus, but an ex- 

»«Thc reader is referred to Rhine^s (65) case study of the longitudinal evaluation of Follow Through, which 
provides a good example of the distinguishing characteristics of transactional evaluation, 
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ploration of the divergent views which result from different perceptions and 
an exaniination of their implications for decision making. All new programs 
create some d ysfu nctiqn in existing school/community relationships. In 
tran,^>5tioi®lTvalua^^^ changes resulting from the creation or addition of a 
"jprogram arcfrcontinuou^ly observed and resulting conflicts are brought .to the 
surface. /^ 

. A third < part of transactiprml. ejajuation is the Transactipnal Evaluation 
Ihstrumcnt— both a product and a process that permits protagonists and an- 
tagonists to clarify their perceptions and uncover sources of conflict or per- 
ceptions of conflicts that were su bmerged . ^ 

Finally, transactional evaluation differs from traditional approaches in the 
emphasis it places on diagnosis and improvement rather than on establishing 
the superiority of one program or method over another. Although, again, 
there is some disagreement among writers, Scriven insists upon the im- 
portance^of designing evaluations as comparative experiments on the ground 
'that judgments of worth are comparative (31, 76). Transactional ^evaluation 
is *not concerned with comparative worth; it is qopcemed with social and 
organizational relationships. According to Rippiy (67), the key to the 
transactional moders effectiveness "is the continuous evaluation by both- 
^protsgonists^and^antasaniJits, of both4he.expected and unexpected conse- 
quences of change'' in order to modify and improve the program. 

Grounded in organizational theory, the function most suitable to transac- 
tional evaluation seems to be the evaluation of institutional change projects. 
As Rippey acknowledges, transactional evaluation is based on **a study of 
internal conflict concomitant to change.'* Rippey includes transactional 
evaluation as an essential step in a change strategy which proceeds first to 
establish disequilibrium; increase differentiation; begin change oifa small 
scale under the best possible conditions (which Rippey later explains as firsf 
working only with those who support the change) , improve the climate an'd 
organizational mechanism for change; and lastly, jmplement all new pro- 
grams as temporary, small scale, pilot experiments so that the effects can be 
studied without undo disruption to the entire social organization. Transac- 
tional evaluation requires that protagonists and antagonists jointly establish 
the criteria for assessing and measuring both the planned and unplanned out- 
comes. ^'^^ 

transactional evaluation consists of two main stages. In the first stage, the 
transactional evaluator aims to unco^.ertfie sources of conflict; in the second 
stage, the evaluator uses both .proponents and opponents to develop the 
evaluation plan. In order of sequence, transactional evaluation proceeds as 
follows. , j. 

First, all of the groups involved in or likely to be affected (directly^br in- 
directly) by the change (program) come together for a series of meetings. 
Three conditions must be met during this firsf stage: 1) all groups affected 
directly or indirectly should be represented; 2) a neutral party should 



conduct the meetings; and 3) sessions should be conducted in a nonjudg- 
mental manner. Although feelings of suspicion and distrust are prevalent, 
the issues and sources of unrest may not be clearly defined and the problems 
may not necessarily be those that are articulated. But the climate thus 
created is a^necessary condition for the subsequent development of the 
evaluation plan. • ; 

The second stage involves construction of the transactional evaluation 
instrument, the key to unlocking conflicts and controversies. Again, 
everyone is involved in the process. The evaluator first formulates a general 
statement of the issue in the form of a question based on the feelings 
expressed in the initial meetings. Each participant in the group is then asked 
to respond to the question with a series of statements. These responses are 
collected, Uibulated, and categorized with the original wording retained 
wherever possible. These responses, in effect, become the items for the 
instrument. 

The transactional evaluation instrument is administered, and participants 
respond to each of the items appropriate to their role group (for example, 
teachers, administrators, students, parents, and so forth). Responses are 
tabulated, a master copy is prepared, and copies are distributed to partici- 
pants. Finally, the last and most important step is the examination of 
responses, which reveals the areas of shared values and goals and the areas 
of open conflict. 

In the second phase of transactional evaluation, the proponents and op- 
ponents of the program (or a particular aspect of the program) develop and . 
implement an evaluation plan with technical assistance provided by the 
professional evaluator, who, according to several transactional writers, 
should be a fully participating member of the program staff. The presence of 
both those who are for and those who are against the program insures that 
program monitoring 'includes not only the outcomes intended by the 
proponents but unexpected negative outcomes suggested by the opponents. 
Nonbelievers who are apprehensive about theif roles once the new program 
is implemented can often be reassured by direct action of the project, in- 
service training where necessary, or clarification of policy. But even more 
important, initial opponents can be given a legitimate role in the program, 
one that often leads to their conversion and ultimate support, or, at the least, 
their understanding and tacit agreement. Resistance may be identified and 
dealt with at each stage of the process of change— when the innovation is 
initiated, when it is being evaluated, when the findings of the evaluation are 
accepted, and when further changes in the.program are recomme.nded. 

The insistence upon the involvement of both factions rests upon George 
SimmePs (78) theories of working relations. Simmel argues that the basis for 
-a positive working relationship is an interaction in which both parties have 
parity in the exchange; where the relationship is not reciprocal, one party is 
diminished and becomes dissatisfied in the relationship. 
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live evaluation, such as that of the 



Transactional evaluation is not as suitable for large-scale, summative 
evaluations, although Cicirelli (15) *s Jggcsts that even a large-scale summa- 



Head Start Program for disadvantaged 



children, may be made more effective if the two major principles of transac-. 
tional evaluation are. incorporated into the evaluation; that is, the groups that 
might feel threatened or adversely affected by the program (or the evalua- 
tion) and thus resist it are identified, and representative samples of these 
groups are involved in the evaluation from the planning stages through the 
implementation stage and.during consideration of findings and implications. 

Transactional evaluation is similaj- in many respects to formative evalua- 
tion, particularly in its concern fpr continuous diagnosis and program 
improvement. But transactional evaluation broadens the scope of formative 
evaluation by involving a larger groijp of individuals, eliciting a wider range 
of opinions and values, and giving more continuous attention to information 
concerning the institutional role, wjien a program of change looks beyond 
the immediate outcomes of its intended goals, examines the roles and ap- 
prehensions of all parties to the syst^'em, and attempts to continuously moni- 
tor its total effects, that program isj participating in holistic or transactional 
evaluation. Case studies of transactional evaluation are presented in the next 
section. 



CASE 



STUDIES 



The Evaluation of Social- Action Programs 

The first two case studies described in this section concern evaluations of 
social-action programs— programs designed specifically to improve the life 
conditions of a particular group of people. These programs vary in scope- 
some cover the nation; some, a state or a city; and some are confined to a 
single site. Social-action programs also vary in size. Some serve thous<tn^s, 
others, hundreds, and still others serve a relatively small number of people. 
Some social-action programs are aimed at a clear-cut, single purpose, such 
as improving children's abilily to read. Others are more complex^nd are 
aimed at alleviating a broad-based, pervasive social problem, su^^h as pro- 
grams, designed to improve mental health or provide equal educational op- 
portunity. These programs have at their base long-range goals which may or 
may not be attainable within the lifetime of any one evaluator. (Contrary to 
some thinking, evaluators are mortal and they have only one life in which to 
evaluate.) 

Social action programs are rarely confined to a single locus. More often, 
program areas are legion, ranging from educational and social welfare to 
medical and legal services. The common thread that runs through these pro- 
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^ grams regardless of emphasis, however, is their goal of improving the life 
' condition of the people ihey are intended to serve. Because of the magnitude 
of this goal, the vast sums of money that have been allocated in order to at- 
tain it, and the variety of services and programs offered, evaluation can play 
an important role in assuring that the programs serve the targeted population 
in the most effecti ve way . 
Understandably, decision makers, particularly legislators and govern- 
, mental. agencies charged with funding these programs, want to know if the 
expenditure is justified. Is the program meeting the goals for which it was es- 
tablished? Can it do so for less money? Should the program be expanded, 
reduced, eliminated? Can the program be more effective if it is revised? 
Regardless of the legitimacy of these questions or the sincerity of the ques- 
tioner, however, these questions can be political and thus their answers 
politically loaded. In Cohen's (16) words, ''Evaluating social action pro- 
grams is only secondarily a scientific enterprise. First and foremost it is an 
effort to gain politically significant information about the consequences of 
political acts." 

A most important issue for social action programs, in addition to overall 
worth, is program improvement. It is quite unlikely (probably more so for 
political reasons than for humane concerns) that any large-scale, broad-aim 
federal or state social program will be eliminated or even seriously limited as 
a result of any evaluation, no matter how inconsequential an effect the 
program appears to be having. What is more likely is that the results willbe 
used to make the programs more effective and more responsive to the needs 
of those the program is serving. Why is the program not more effective? 
How can the services be improved? What other services should be added? 
These questions are far more important for the ultimate solution of the prob- 
lems thesQ programs were designed to address. Still, attention must be paid 
to both sets of questions within the context of each program site, taking into 
consideration local program variations. The problems in methodology as 
well as the constraints arising out of political and emotional factors are dis- 
cussed in the following two case studies. The first case study illustrates the 
holistic evaluation approach, the second case study provides an example of 
transactional evaluation. 



EOPS: A Case Study of Holistic Evaluation 

Extended Opportunity Programs and Services (EOPS) is a special program 
established in the California Community Colleges for the purpose of provid- 
ing rqual educational opportunity to racial and ethnic groups and the 
minority of whites who had formeriy been denied access to college because 
of deficient academic backgrounds and/or a history of poverty. In order to 
help these people gain access to college and meet the demands of academic 
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life, the program provides financial aid and sui5portive|ervices in the form of 
tutoring and counseling. , 

EOPS was conceived in response to the civil rights movement of the l%Os 
and the consequent political pressures to remedy the neglect of large groups 
of people by our m^or social institutions. As with the rest of the country, 
California's higher. education system served primarily white, middle-class, 
economically advantaged students. Prior to the establishment of EOPS, a 
white middle-class student was twice as likely to enroll in college as was a 
member of a racial or ethnic minority. The wave of social consciousness that 
gave rise to.the massive federal programs such as Project Head Start, Follow 
Through, and Title I of the ESEA also stimulated the thinking of California's 
leadership, and in 1968, Senate, Bill 164 established the Extended Op- 
portunity Programs and Services in the California Community Colleges. 

Like so many.other social action programs established at that time, EOPS 
expanded rapidly, growing from a $3 million program in 46 community 
colleges to a $7.6 million program in 94 community colleges in less than ten 
years. The speed with which EOPS escalated compounded many of its early 
deficiencies and added to the difficulty of its later evaluation. Staff were 
hastily selected without full attention to their qualifications as adminiatra- 
tors Programs were often instituted without adequate consideration of the 
goals and needs of individual colleges or the values and attitudes of the 
college and community membership. Many campus programs lacked careful 
planning. Participant data was seldom recorded, and few campuses docu- 
mented the process qf implementation. Other than the head-counting reports 
•submitted annually to the Board of Governors, the policy-making body for 
the California Community Colleges, no systematic evaluation of the program 
was ever undertaken. However, the economic recession of the 1970s, cou- 
pled with the growing suspicion that masSive social-action programs had not 
significantly alleviated- the country's major social problems, finally led to a 
concern for evaluation, and in 1975 a "formative;', evaluation of this multi- 
campus program was conducted seven years after it began. 

The Situation: The m^or purpose of the evaluation, as stipulated in the 
evaluation contract, was to determine the extent to which the commumty 
colleges had met the objectives of the legislation, those of the Board of 
Governors, and those of the individual colleges. 

All told, there were 31 m^or objectives, ranging from those aimed purely 
at, implementation (for example, "the community colleges shall establish 
. . .") to those aimed at various student outcomes. The objectives, like the 
jjrogram, were also seven years old , but it was clear that at least a part of the 
design would be simple. A straight accountability approach could be used to 
determine if the colleges did in fact do what they were supposed to do-^es- 
tablish financial aid and supportive services for persons of minority and/of 
disadvantaged background. But an additional charge of thexiontract was that 
specific recommendations be made regarding program improvement at both 
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the state andLlocal levels, ^nd fulfilling this requirement was hardly a simple 
matter.*^ • • 

Broad-aic soci^ action programs are not one dimensional; rather^,they 
are composed,ofa vast array of complex, interactive elements loosely called 
a program, if the purpose of an evaluation is, at least in part, to provide in- 
formation for making improvements in a program or particular parts of a 
program, then it becomes necessary to distinguish the differeniial impacts of 
^these parts and the processes that contributed to them. In the case of the 
*EOjPS evaluation, this taslc posed several methodological problems. For 
example, two of the EOPS objectives concerned improving minority 
students* self-concept and instilling in them pride in their cultural distinc- 
tiveness. Both of these objectives are noble, and they are plausible within 
the parameters of a social program designed to equalize educational op- 
portuoity. But to empirically distinguish the elements pr parts of a program 
*-3irectly aimed at . improving^ selfrconcept or .instilling cultural pride 
necessitates not only a specificationiof criteria concerning what constitutes 
positive self-concept and cultural pride, but, even more, a knowledge of 
what in fact influences the development of these qualities. No one yet knows 
what educational or social practices or policies contribute to self-concept 
and cultural pride. We can speculate that a **supportive" environment (and 
this, too, needs clarification) may contribute to a feeling of acceptance, 
which, in turn^ jnight enhance self-concept. Blut in the absence of a specific 
program designed especially for improving self-concept which can be 
rigorously evaluated, it was impossible to determine with any degree of 
certainty the extent to which participation in the program generally, or in a 
certain program activity specifically, contributed to the enhancement of 
these qualities. The only practical alternative was to examine the results of 
participation in the program as a whole versus nonparticipation, and this 
opened up another methodological problem — the lack of control or 
equivalent comparison groups. 

As we said eariier, most large-scale social action prograius defy rigorous 
experiment, and EOP^'is no exception. Tt is impossible to deny the program 
to some people in order to form a control group. One does not assign people 
to treatment and nontreatment groups where financial aid is concerned, and 
it is too difficult to develop a placebo program that is different from the 
program being evaluated and yet of equal benefit to the participants. EOPS 
is-a conglomerate of individual programs, and each needed to.be evaluated 
separately. Ah experimental design would have required a control group of 
no npart?ci pants for each local project.. 

It was obvious that a classical experimental design could not be imple- 
mented. It was equally impossible to identify an equivalent group for com- 



"As it turned out. even the accountability phase was not simple, since many people disagreed about the 
qualifications of the target population and what constituted a '* disadvantaged** person. 
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parison purposes since, ostensibly, persons most in need of financial and 
academic assistance were those recruited to the program. To identify a 
group of disadvantaged students who were not enrolled in the program 
would have been equivalent to admitting that they were not as nee^. 

As more and more problems and issues emerged, the requirements for the 
desigrf became more complex. The diversity of settings in which the 
California Community Colleges operate had led to tremendous variations in 
program orientation, style,, and implementation. Different colleges had 
adopted different approaches, and somewhat different emphases in program- 
ming in response to their different needs and goals, as wejl as those of the 
community. Because of these differences, it would have been misleading to 
evaluate HOPS from the state level or on the basis of a few selected pro- 
grams. 

There were, in effect, about 95 distinct programs.* It was clear that an im- ^ 
portant contribution of the evaluation would be a description of the various 
program approaches and the forces that contributed to the different program 

shapes. . ,ru j 

r Other problems resulted from the fact that the HOPS program itself h,ad 
changed over the years. The original objectives outlined for the program en- 
visioned HOPS as a special, separate entity with a full array of financial aid 
and academic and personal support services on each campus. Because of the 
community colleges* historic charge to be responsive to local community 
needs, and because of increasing federal aid programs, HOPS had evolved 
so that on many campuses it was no longer distinct* from similar programs 
and services available fqr all students. Nor did every campus necessarily of- 
fer all of the originally intended service components or emphasize them m 
ways called for in the 1968 enabling legislation. In short, the goals and 
activities of the program, as well as the criteria for program success, had 
changed appreciably over the years. The programs did not exist m is'^lation; 
they were part of a community college— a functioning institutional system— 
and as such they were subject to the workings of the system as a whole. 
Changes in any one part of the system influenced changes in all of the other 
parts, and reciprocally, the HOPS program impinged upon the institutional 
\ -environment if Cor no other reason than that it existed. To try to ferret out 
specific outcomes attributable only to the program was a Sisyphian task. 

A third problem that emerged shortly after the study began was the dis- 
cov.ery that different gcoups of people at different levels of the program and 
college hierarchy held quite different values and attitudes concerning both 
the- nature of the program and its major purposes. Some saw the purpose of 
the program primarily as a means to increase the number of minority 

- *Another real world lesson is never to leave data on the floor. The custodian threw out all of the data that had 
been neatly stacked on the floor at one college and ii had to be eliminated from the study The final sample 
size was 93. 
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students in the postsecpndary population; some saw it as a means to placate 
the. colleges' liberal constituency. Some groups stressed quality oyer 
quantity, believing that the goal of the program was to make the greatest im- 
pact on'the lives of the people participating irrespective of their numbers. 
Other groups believed that the program sl?ould process as i^any people as 
possible in the most economical manner. Still others saw the program's pur- 
pose as providing an education to a large group that formerly did not receive 
one. 1 

Not only were the criteria for program success different among these 
groups, but tfiey expected to receive quite different information from the 
evaluation. The legislature and the Board of Governors,^ for example, 
wanted to know if EOP^ students (supposedly "high risk, multiply disad- 
vantaged") maintained grade-point averages and retention lates comparable 
to students who did not participate in EOPS. The statewide community 
college office was concerned with the coordination of the prbgram and rela- 
tionships between campus program* personnel and Jhe statewide office. 
EOPS directors and staff on the campuses were concerned pbout program 
delivery and wanted to know if students were satisfied with the support 
services. Faculty and administrators had still other concerns. ^ 

To complicate things even further, policies governing community college 
enrollment in California decree that anyone who has a high school diploma 
or is over the age of eighteen may enroll. This means that,^although some 
recoitis are kept once a student is enrolled (in most cases college grade-point 
average), even minimal entry data is unavailable for many students. Reten- 
tion data are complicated by the fact that students dropjOut, stop out, 
transfer to other community or four-year colleges or Obtain program 
certificates in lieu of Associate of Arts degrees. They may al^o become ineli- 
gible for EOPS any given term due to lack of credits or failure to fill out re- 
quired renewal forms. Thus, rigorous documentation of educational out- 
comes, and particularly follow-ups of students' subsequentjacademic work, 
have not been a hallmark of the community colleges^ data collection 
practices. I ^ 

The ideal design for the EOPS evaluation would have been to determine 
long-range outcomes such as the extent to which the EOPS ^tudents became 
happy and productive citizens, a pay-off too far in th^ future for the 
program's immediate evaluation needs. In the absence of longitudinal data, 
it could only be assumed that present attitudes and behaviors were in some 
manner or another indicative of future attitudes and behaviors. In com- 
promise, these considerations were incorporated in^the survey given to the 
student samples. 

Finally, by its nature, evaluation is a political activity. It provides in- 
formafion for decision makers and legitimizes their subsequent decisions. 
Where decision making is in itself political, involving the allocation of 
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power, authority, position, or resources, evaluations frequently result in a 
reallocation of resources. In this case, although there seemed to be no ques- 
0 tion about the continued funding of the program (and shortly after the study 
began, and for no apparent reason, the governor increased the entire ^tate 
EOPS budget by 50 percent), many of the people connected with the 
program at both the state and local levels were fearful that the funding was in 
jeopardy, that the proportions allocated to the various program components 
might be shifted, and lhat the evaluation results might seriously endanger the 
program. 

An evaluation approach was needed that would take all of these factors 
into consideration— a design that would be comprehensive; attentive to both 
process and product; sensitive to the poliiical nuances surrounding the 
program and the consequent fears of a good many people; address th'; dif- 
ferent information needs of various constituencies; allow for changes in the 
goals; incorporate the different values, perceptions, and criteria of different 
groups of decision makers with varying levels of power and influence oyer 
the prdgram; and compensate for the lack of pretest data on the participants. 
The design also had to be flexible enough to accommodate 93 different pro- 
grams;and to be implemented within the constraints of a minimal budget and 
a one-jyear time frame. At that point, the authors wrote a paper entitled 
"How to Evaluate a Complex,, Multi-campus Program in a Large State 
System in the Real World of Higher Education where Campus Projects are 
Diverse, Political Pressures Intense, No Control Groups Can be Formed and 
No Evaluation Model Fits: or, Campbell and Stanley, Where are You 
Now?f^ 

Thei Strategy: The decision was made to develop a holistic approach to the 
evaluation that emphasized careful documentation and description of 
processes and activities and at the same time focused on actual outcomes ir- 
respective of prespecified objectives or criteria. The design guided the 
procedures. The evaluation of outcomes necessitated quantitative data from 
students, faculty, and administrators. The description of processes required 
that representative programs be observed* as functioning units. The holistic 
evaluation designed to meet these information needs proceeded in two 

phases. ^ 

The first phase consisted of a comprehensive survey of randomly selected 
samples of EOPS. students, administrators, Jaculty, counselors, program di- 
rectors, members of local advisory committees, superintendents of multi- 
campus districts, and non-EOPS students. SinQC a nf^or criterion for suc- 
cess, as defined by the Board of Governors and the legislaturer, was* that 
HOPS students perform as well as other students, the relevant comparison 
group for fhe study was the population of non-EOPS students carolled in the 
colleges. In this c^se, a nonequivalent comparison group was not only ap- 
propriate but also essential to the purposes ofthe study. . 

The purpose of the survey was to compare the characteristics. 
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experiences, perceptions, and attitudes of representative samples of HOPS 
students with those of non-EOPS students and, to examine the attitudes, , 
values, aijd opinions of the program held by both HOPS and other college 
staff members. Contrary to many surveys used .in sodal research, however, 
this one was not a * -fishing expedition." Rather, it consisted of very specific 
criterion instruments developed to measure each of the pre-established ob- 
jectives set by theenabling Segislation and the Board of Governors. 

The second phase of the study consisted of intensive (and extensive) cas^ 
studies of twelve colleges which were systematically selectee^ to represent 
the diversity of the CvJifornia Community Colleges in terms^ of size, 
geographic region, urbaa'rural setting, ethnic mix and programming em- 
phasis. \ 

The Survey: In order to develop relevant questionnaire items that would 
identify outcomes and processes, the full range of issues and questions sur- 
rounding 'the study were first outlined according to all of the program docu- 
TVmts — the enabling, legislation, the Board of Governors' Statements on 
Polidy and Goals, Title V of the Education Code, volumes of documents, 
data appli(;ations f6r funding, and previous in-house evaluations of progi'ams 
provided by the Chancellor's Office. These documents not only helped 
enumerate key issues and questions, but they also provided a historical 
perspective of the deveiopmeut of the programs on the different campuse?. 
and identified the forces that shaped their implementation and subsequent 
maturation. During the period of time in which the instruments were 
developed, frequent meetings were held with the Chancellor's Office staff 
and selected EOPS director.* in order to more fully understand the attitudes 
and behaviors of key g>oups invol yedin the program. 

One hundred forty-six q^esr2ons were cast into questionnaire items appro- 
priate to each sample. The design called for comparisons of the different 
samples, and corsequently, there uas much overiap between instruments, 
particularly .vit'ith respect to attitudes and opinions regarding the progsam in 
general and the campus situation in particular. The preliminary set of ques- 
tionnaires were pretested, and lengthy discussions were held with 
representatives of each sample group who .suggested additional items, 
revised items, and deleted still others. The students were especially helpful 
in identifying w/^,ds that had hidden or slLng meanings and otherwise clarify- 
ing the language of the items for the student population. Jn the process, they 
eliminated the unintentioi]al but, nevertheless, insidious "educationese." 
The revised instrun^ents were submitted to the statewide office for review, 
and after the reviewers' comments had been incorporated, they were 
finalized and printed in booklet form, color-coded to represcnt^the different 
samples. 

In ^addition \o the survey questionnaires, a Basic Data Sheet was 
developed for the cpiLo~t /"^er to gather baseline data ^n the vital statls- 
'tlcs of the local coHeg§.and the campus programs. This information included 
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cnFOHmerit figures^ funding allocations, staffing and, where available, data 
oh studentsVhigh school and college grade-point averages and retention. 

During the final period of refining and printing the instruments, the study 
ieam also began:to Wric with representatives from each of the 93 colleges. 
ThC'Xommumty college i5residents had each designated a liaison person to 
coordinate campus activities and facilitate communications between Ihe 
cvaluators andthe campuses. In some cases, the appointed liaison was the 
campus HOPS director; in, others, the Dean of Student Personnel Services; 
khd in a few cases, a faculty member served as ,a liaison. 

Six regional training wQrJcshops were conducted by the evaluators in or$ler 
to acquaint ^he liaisons with the purposes of the evaluation, the design, the 
purposes of 4he instrument and, on a practical level, the procedures they 
were to use for selecting local samples and administering the questionnaires. 
Thete were several pay-offs from these workshops. In addition to giving the 
Jaisons specific instructions about administering the surveys, the workshops 
provided a val uable opportunity for the evaluators to meet the people from 
the campuses^^answer their questions, and secure their trust and coopera- 
tioiv. Inium, theif cooperation helped gain the interest and involvement of a 
broad cross section of community, college personnel, and as a result, al- 
though the survey instruments were of necessity quite fengthy (ranging from' 
14-20 pages), response rates for all constituent groups w^re phenomenal— 
rajiging from 70 to 90 percent. All of the Basic Data Forms were also com- 
pleted correctly, a feat not easily accor^lplished. ^ 
The Case Studies: At the heart of social programs is really the issue of in^^ 
stitutional change, and the degree to which efforts at chan3e succeed or fail. 
Quantitative data alone cannot adequately determine the extent to which 
any particular institution brings about change. In order to obtain this type of 
information, the second phase of the holistic evaluation strategy consisted of 
a series of site visits to 12 case-study coHeges. Twenty-five colleges were . 
first nominated by the statewide office to represent diversity in tetms of size, 
region, number of colleges in the district, type of district (that is. single or 
multi-campus), ethnic composition, average family income, and ethnic com- 
.position of the surrounding community and scope and emphasis of EOPS. 
From this list, 12 colleges were selected as case-study sites, arid all accepted 
^ the invitation to participate. Preliminary visits were made. to meet each 
campus liaison, arrange for lodging, and clarify logistical arrangements for 
the site-v|Sit teams. 

In keeping with the goal^^Qf involving different constituencies and persons^ 
at diffeif^nt levels of decision making, nominations for site-visit team 
inembers v.'ere solicited from some 420 community college personnel, 
^ including superintendents/presidents, deans, faculty, vice presidents of 
studetit services, heads of counseling, EOPS directors, and officers of the 
state EOPS student organization. . 
in ^1, 497 persons were nominated to fill the 30 team positions— six teams 
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of:five persons each in addition to a member of the evaluation staff. Persons 
'who.Tcc'cived three nominations or more were invited to indicate their will- 
injgness to' serve on the team, and 1 15 persons accepted. A final group of 30 
persons- was! selected so that each team included a president, a dean-level 
'^^representative of student services, an EOPS director, a member of the 
.faculty^ and a currcpt or fomer EOPS student. Women and minority 
' persons \yere represented on each team and, with only two exceptions, team 
members' were assigned to site-visit campuses outside of their home regions. 
- "^here/ were actually several purposes of the site visits. First, as charged 
, ^by 'tiie Board' of Qoverndrs, a m^or purpose was to describe the ways in 
which /each xol[ege implemented the ^activities and services designed to 
achieve the objectives specified! in the initial legislation^ A second, related 
^ purpose: was to document how effective each college, had been in achieving 
'those objectives. In order to pro^/ide the information necessary for program 
improvement, it was also necessary to investigate the structural and staffing 
toangemchts of the program and program features and characteristics of the 
Qoliege and community \hat appeared to be related to program effectiveness, 
and to determine the functional relationship of EOPS to other programs 
\yhhin the institution. Finally, an important purpose of the site visits was to 
determine the extent to whicl^ data gathered in the surveys accurately 
• [ , reflected conditions as observed by the site-visit teams and. reported by the 
different persons interviewed. ! 
; Holistic evaluation demands a delicately balanced investigation. Relying 
I too heavily either on a set of outcomes or the perceptions and opinions of 
/ different groups may give a wholly unrealistic impression of the actual 
program operation. How the staff and participants feel about a particular 
^ _ program in which they are involved matters a great deal. Are the services 
"^^^sinfed to their needs? Are they treated differently in other areas of the insti- 
tution because of their participation? What are the physical arrangements for 
the program? What factors seem to be most related to participant satisfac- 
tion with the program? Itis simply not possible to disentangle completely the 
attributes of the process and theiquality of the outcomes that they generate. 
. Program evaluation must include an understanding of the particular program 
in the local sense, and such an understanding can only be gained by on-site 
experience and systematic observation. 

For example, an important finding that resulted from examining on-site 
structural and staffing arrangements was that on some campuses the EOPS 
offices were cramped, dismal, unattractive holes-in-the-wall located at ob- 
scure comers of the campus far away from either the central administration 
building or the social gathering area for students. Both EOPS staff and 
students reacted verbally to this ''second class" treatment. The condition 
and location of tl^e campus EOPS office, moreover, was consistently related 
L ,tO;theiCo!!ege^s.CDnin!itnient.to EOPS as '.vel! as Its^perceived value to the 
faculty and staff, and this, in turn, was strongly related to students' satisfac- 
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tion with their college experiences generally and their experiences with 
EOPS specifically. In other cases, the style and orientation of the EOPS di- 
rector was related to both the direction of the program and students' at- 
titudes. 

In fact, information gained during the site visits demonstrated that the 
cluster of variables that came to be called a college's **emotional" commit- 
ment to EOPS was often more important than its financial commitment in af- 
fecting students' feelings of satisfaction and their social integration into the 
college— one of the m^or objectives of the program. If these elements had 
been omitted from the study, a valuable source of information, which in 
many cases kplained differences in outcomes and pin-pointed areas needing 
improvement* would have been lost. 

Each tea4 visited two colleges, spending two and one-half days on each 
campus. Each visit was immediately preceded by an eight-hour orientation 
meeting during which the evaluators clarified the. purposes of the site visits, 
the methodology and rationale for the interview schedule, and the general 
procedure to be followed during the site visits. Each team member was given 
a detailed outline of tasks and a set of questions to be investigated for each 
task. Formal sessions were co.nducted with presidents, a cross section of 
other administrators, faetrtty members, counselors, current and former 
EOPS students>.meiTl6ers of governing boards, and representatives of local 
advisory committees, community schools and agencies. In the case of multi- 
campus districts, a top-level representative of the district office was also in- 
terviewed; At least two team members participated in every interview 
session in order to assure inter-rater reliability. In addition to the formal 
sessions, site teams observed the EOPS staff in action, chatted informally 
with students and staff at the tutoring and counseling centers, and generally 
observed the overall campus environment. 

The team members .met for long sessions each evening to review and in- 
tegrate their notes. A statement of m^jor observations was presented to 
officials of each college prior to the team's departure from the campus. 
When all the site visits were completed, team.members each drafted a profile 
of the college's EOPS incorporating theirown opinions as well as informa- 
tion obtained froi^the interviews. Drafts were then compared and com- 
posite profiles developed by the evaluation staff. Unlike most evaluation 
reports, paiticulariy those derived from experimental designs, data gathered 
in the two phases of a holistic model are presented necessarily in a two- 
volume report, with the first volume^consisting mainly of analyses and in- 
terpretations of quantitative data and the second volume containing the nar- 
rative case-study profiles. A m^or weakness in holistic evaluations arises, 
however, when the process data and outcome data yield contradictory in- 
formation. This is a parliculariy difficult problem to resolve when the 
balance between the two forms of data has been ron^clentiously maintained, 
and the evaluators can only rely on their intuition as to which data are more 



Ukdy reflective of the **true** situation. The only solution, of course, is to 
present bcth sets of data, acknowledge their differences and withhold judg- 
ment\unless a strong case can be made for the superiority of one set of data 
over another. 

In the EOPS evaluation, the confluence of findings -between the survey 
and case-study data was amazingly high. As a result, some information 
gathered at the site visits was also integrated into the first volume >yhere it 
cbrroboratecf data gathered from the surveys or directly from the colleges. In 
the few cases where the data were contradictory, both sets of information 
were presented and their sources identified. 

Summary: The strategy of involving a large number of people from the be- 
ginning^of the evaluation and of consulting with representatives from key 
groups at different levels of influence and responsibility permitted a wide 
range of criteria for measuring program effectiveness to be included in the 
study and guaranteed that the evaluation was both site-specific at the local 
level and yet met the requirements of decision makers at the state level. An 
important offshoot of the **people-involvement" process was that it served 
to reduce, and in most cases eliminate, people's fear of the evaluation and 
the outside evaluators. 

The fact that intensive site visits were made and case study descriptions 
prepared assuaged the concerns of **process" people who were initially sus- 
picilt.s of the evaluation. The quantitative and survey data gathered within 
the context of the original program objectives garnered the support of the 
outcomes-oriented cohort. As a result, the level of cooperation from all* 
groups was impressive. ^ 

Finally, and perhaps most importantiy, the combination of case study and 
objective-based data and the widespread participation of people in the study 
as' liaisons, consultants, Advisory Board members, and site-visit team 
members had a significant effect on the use that has been made of the find- 
ings and recommendations. When evaluation is part of a process of planned 
chaiig?., the utilization of the findings in decision making is a key concern. 
And when recommendations are based on multiple indicators gathered from 
a wide variety of sources, there is little doubt as to their veracity and littie 
resistance to their implementation. The EOPS evaluation report never 
gathered a speck of dust. The statewide office moved to implement many of 
the recommendations less than a month after the report was completed, and 
several campus staffs began making changes based on suggestions made dur- 
ing the site visits even before the study was completed. 

There are many reasons why evaluation results are seldom used. Rarely 
does an evaluation study come up with a revolutionary and unequivocal set 
of findings that can be used to pinpoint exactiy the areas needing change, 
define what kind of change is needed, and estimate with complete accuracy 
the true worth of the program for all participants. More often than not, 
ev^uations yield findings that can be interpreted to mean that in some cir- 
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--cumstances, certain kinds of programs may be effective to some extent with 
aome kin.ds of people, Far from being definitive and unequivocal, the find- 
ings are more often tentative, ambiguous, and site and time specific. Weiss 
(98) suggests the following three conditions as contributing to the lack of 
utilization of evaluation findings: 1) the results do not match the information 
needs of the decision makers; 2) the results are not relevant to the level of 
decision maker who receives them; and 3) the results are ambiguous, and a 
clear direction for future programming is lacking. We suggest that still 
another reason for the infrequent use of evaluation findings may be that the 
recommendations and/or suggested directions are too massive— akin to 
metamorphic change a la the CIPP model. 

While it is still too early to tell what changes will be brought about by 
legislative action as a result of the HOPS study, the fact that the statewide 
office has already begun implementing several of the recommendations 
made in the report, attests to both the genuine concern on their part for 
program improvements and also to the fact that the changes suggested were 
' reasonable and practical . 



Project Head Start: A Case of What Went Wrong 

Head Start is a large-scale, broad*aim, federally funded social-action 
program in which a variety of services (instructional, medical, dental, 
psychological, and nutritional) are provided for poor preschool children. 
Head Start began in the summer of 1965. Like HOPS, it grew rapidly, and by 
1967 approximately two million children, the m^ority of whom were from 
minority backgrounds, had participated in the program. 

Also like EOPS, the nature of the program and its goals posed many 
difficult problems for evaljation. Since the program seeks to bring about 
major political and social changes, its evaluation cannot be approached as if 
it were a traditional program designed to bring about traditional, incremental 
educational change. The goal is broad, the program is directed at millions of 
children all over the country; program delivery varies greatly from com- 
munity to community; the program was not created locally, hut by the 
federal government, and the amount of money invested is enormous. Unlike 
EOPS, however, evaluation was planned for from the beginning, and several 
evaluations were carried out by the program's Office of Research and 
Evaluation and its ^3 Evaluation and Research Centers in universities 
throughout the country. Still, from the beginning, evaluation met stumbling 
blocks. Most studies were local or regional, and it was impossible to de- 
termine the extent of the program's overall effect or even the effectiveness 
of the different types of local programs. 
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' This case study conv*:erns the national evaluation of Head Start conducted 
for the Office of Economic Opportunity by Westinghouse and Ohio 
Umversity (14, 104), The study included a national sample, comparison 
groups of honpart'Ivipants, multiple measures oF cognitive and affective 
development, and an evaluation of program outcomes through the third 
grade. The purpose of the evaluation was to make an overall analysis of the 
program , providinfe Jnformation for policy makers to decide if the program 
.should be continued, modified, or if parts of it should be dropped. The 
evaiuatioii did not include investigating the effectiveness of local imple- 
mentation procedures or the delivery of program components. 

The basic question that the study addressed was: Do children in the first, 
second, or third grade who have had Head Start experience, either summer 
or.full year, differ significantly in their cognitive and affective development 
from comparable children in those grades who did not participate? 

Sample: A national Bample of 225 program sites was randomly sheeted for 
study from the. i2,927 Head Start Centers in operation during the 1966-67 
school year. Only 104 centers were ultimat^ely confirmed as investigation 
si^es due to the absence of appropriate control groups, lack of staff during 
the summer phase of the program at some sites, and the fact that some of the 
programs had 'been in operation for only one year. Other centers were ex- 
cluded because some of the schools in their target areas declined to par- 
ticipate in the study. 

Procedures: Fifty-five interviewers were recruited and given a one-week 
training course to prepare^ them for the field studies. They were each 
assigned to two sites spending approximately three and one-half weeks at 
each center, meeting with three groups of people during their visits: local 
Head Start officials, school administrators, and parents of both Head Start 
and control-group children. The progression of their activity at each site was 
as follows: 

1. Interview the Head Start official. 

' 2. Obtain a master list of pupils who had attended the center in the. 
specified program and year. 

3. Visit the local schools and identify all Head 3tart children still enrolled. 

4. Draw a random sample at each of the grades represented. 

5. Consult with Head Start and school officials. 

6. Stuay all available records to identify a control population, matching 
♦ each Head Start subject with a control subject on the basis of sex, race, 

and kindergarten attendance. 

7. Interview the parents or guardians of each Head Start and control- 
group child. 
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8. ArranRc for the testing of pupils to be conducted subsequently by field 
examiners. 

9. Write a field report and complete a questionnaire on field experiences. 

The Head Start officials, school administrators, and parents were all very 
cooperative. The only problems encountered with centers arose in those 
casc5 where poor .records had been kept; only 10 school systems were 
considered uncooperative, although the cooperation of others reportedly re- 
quired exceptional diplomacy on the part of the field interviewers. Over 90 
percent of the parents were reported to have been *Wery cooperative" or 
**cooperative." The most serious problem faced in the study was finding 
parents who had moved or been relocated by urban renewal projects. In Ap- 
palachia, one field worker, mistakenly identified by a narent as a "revenue" 
agent, was shot at! But other than this somewhat humorous incident (at least 
in retrospect), surprisingly, everything went according to schedule, and 
overall resistance to the evaluation was considerably less than the investiga- 
tors had anticipated. , • 

Results: Briefly, the m^or findings of the study were that the summer 
Head Start programs were not effective, and the full-year programs were 
marginally effective. The m^or recommendations were therefore obvious: 
The summer program should be phased out, and the full-year program 
should be continued and improved— very simple and very straightforward. 
But, as many readers are aware, this evaluation and its findings became the 
subject of a heated controversy that swept the coujitry, damaging the 
public's faith in national evaluations, and the residual effects remain to this 
day. 

The fires of the controversy were lit when the findings of the study, 
presented as the first draft of the final report to the Office of Economic Op- 
portunity for review and comment, were released to the public prematurely. 
The findings were in preliminary form and excluded several statistical 
analyses that were subsequently added to the final report. To .,iake matters 
worse, these preliminary and incomplete findings were reported as definitive 
to Head Start schools, officials, and concerned parents by the news media, 
notthe evaluators. - \ 

The attention focused on the study fanned the fires and serve J as a rallying 
P9int for proponents of the program, who gained additional media time and 
s^ace to critique the study. The essentially negative findings of the evalua- 
tion provoked local testimonials in defense of the projects, as well as news- 
paper editorials and other reactions from those in the **early intervention" 
philosophical camp. The study was scrutinized and attacked as ho study had 
ever been up to that time. Scholarly journals burdgeoned, and conferences 
overflowed with critiques of the methodology, statistical procedures, and 
ioutcoriie criteria selected. Most of these criticisms dealt with the defects 
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inherent in studies of social-action programs (1 1, 16, 50, 80, 106)— criticisms 
that could be applied to many evaluations. In fact, McDill, McDill, and 
Sprehe (53) question whether such stro^ig criticism would have been fortK- 
coming from so many quarters if the evaluation had been more favorable. 

Summary: This case study was not used merely to provide an example of 
the failure of experimental design, but rather as a contrast to the previous 
case study of the HOPS evaluation and to illustrate what can happen in the 
case of a real world evaluation that is potentially political and emotionally 
volatile, and-^what pitfalls can be avoided. Cicerelli (15), too, has reflected 
upon the study and suggests that if the principles of transactional evaluation 
ha.d been applied, much of the misunderstanding and conflict that ultimately 
" defeated the evaluation could have been greatly reduced, if not -avoided 
altogether, For example, the Head Start evaluation clearly threatened the 
jobs of many people, and this is^ue should have been confronted. Although 
the direct participation of all parties concerned would have been impossible 
(if not logistically, at least economically), much more contact and *^*checking 
in** with. representatives of the various constituencies could havebeen.car- 
ried out throughout the evaluation. At the same time, although reaching 
consensus regarding the criteria by which to judge the effectiveness of; a na- 
tional program is equally impossible, greater efforts at including a broad ar- 
ray of criteria, including some that were*acceptable to each constituetncy, 
could have been made. \ 

Here again was a case of differing values and perceptions and the absolute 
necessity of clarifying these differences prior to the evaluation. The govern- 
' ment and the evaluators agreed that the cognitive and affective aspects^ of 
the Head Start childrens* development were the most important objectives 
of the program. But others felt that the voluntary parental involvement and 
the nutritional benefits gained by the>children were equally important, espe- 
cially in the case of the summer programs. In fact, the same criteria were 
used to evaluate the two separate program components', summer and full- 
year, although both the objectives and the length of time available to work 
toward their attainment Nyere different for the two components. The centers 
weren*t consulted regarding the criteria by which the program was to be 
evaluated, and yet, the local programs varied depending upon what their ^ 
center's primary, secondary, and short and4ong-term objectives were for' 
the program, particularly during its developmental stages. Many centers ' 
may have been directing more attention and energy tc other objectives in 
response to local needs, not necessarily to the exclusion of, but in addition 
to the objectives defined for the overall program. Since they were not 
consulted regarding their objectives and program emphases, valuable in- 
formation was missing Trom the evaluation. 

Field interviewers were assigned to sites on the basis of their comple- 
jnentary ethnic or racial backgrounds and/or their multi-linguistic abilities. 
In addition, however, local people could have been effectively involved as 
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. research Qoordinators to assist in the field Work, just as local liaisorjs were 
used in the EOPS study. As it was , communications between the center and 
school staffs atf^ the investigators were poor. Only the top administrators of 
the centers and thVschools knew about the study before tjie fiejd inter- 
viewers appeared on the scene; teachers, counselors, and parents^ were al- 
most totally excluded by design, if not intent. Local persons jwould not only 
have been able to deal more effectively with the resistance to the study on"* 
the part of local school staffs and improve channels of communication, but 
they also could have been used to share some of the prelimmary findings of 
the investigation with local Head Start and school personnel and parents, 
perhaps helping to avoid the, furor created by the preniaturely released 
report. / 

Following the transactional model, protagonists and antagonists should 
have been brought together both in the planning stages of^the evaluation and 
during its implementation. By getting these groups, or at least representa- 
tives of them, involved from the beginning, their resistance to the evaluation 
could have been stemmed. The purpose of the evaluatic^n was to provide in- 
formation to decision makers regarding the future cjf the program. Ob- 
viously, a decision to eliminate or greatly reduce a program such as Head 
Start would be threatening not only to the target population but also to the 
staff employed in the program. Transactional evaluation principles could 
have been used to reduce the consequences of this threat by recognizing it, 
bringing it to the surface, and enabling the different groups to confront their 
conflicts and resolve them. i 

The unfortunate release of the draft report probably could not have been 
avoided by any methodology or clever technique, but its impact would have 
been reduced if representatives of the different constituencies had been in- 
volved and if IoceJ persons had been participating in the evaluation. The 
findings and recommendations might have remaii\^d the same, but the dif- 
ferent views would have been acknowledged, the sources of information 
would have been apparent, and the intents and purposes of the evaluation 
would have been clear. . / / 

/ / 

. Curriculum Evaluation / ; 

* ( . 

Scnools are particularly neglectful of curriculum evaluation. This may be 
due, at least in part, to the problem of defining wjial a purriculum is. Accord- 
ing to Stake (85), "a curriculum is an educational program." An educational 
program is fairly easy to identify in the public school^ where one can define 
curriculum as an integrated system of learning materials, activitie3, and 
experiences. However, as Dressel (21) points out, ''jn higher education, the 
meaning of curriculum is far less explicit.** ^hen the term is used in a 
postsecondary setting, it can be referring to all courses offered in a particular 

institution^ to those contained within a particular department- or field, or 
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even to an individual student*s course of study. 

There are also many different ways in which a curriculum can be struc- 
tured. It can be based.on an assembly of courses that are deemed necessary 
to meet certain job requirements; it can be formed from the basics of a 
particular discipline or the specialized interests of the faculty in a depart- 
ment; it can be designed to meet the needs of a professional or technical 
program; or it can be developed a§ the result of a systematic specification of 
outcomes (21). But, regardless of the way in which a curriculum is 
developed, it must be updated and revised. It must therefore be periodically 
evaluated. 

Unfortunately, curricular change is seldon followed by rigorous evalua- 
tion to determine its effectiveness; even more rarely is it preceded by a 
systematic assessment of the actual need for changes or the directions they 
might take. Responses to curricujum evaluation often take the„form of either 
cosmetic changer^ 'efenses of the status quo^ql both, since most evalua- 
tions are designer iew the curriculum only in its own Jight without regard 
for long-range school or program goals. Even where specific goals have been 
defined, curriculum evaluations should not be based merely on their attain- 
ment. The goals themselves must be evaluated in order to determine their 
worth, relevance, and interrelationships within the context of both' the 
. overall program and the system. 

There are several problems that typically mitigate against systematic cur- 
riculum evaluation. First, many faculty members view curriculum evalua- 
tion as an imposition on their inalienable rights . 5 teachers. In particular, if 
the curriculum is based on their specialized -interests, they view its content 
and substance as sacrosanct. Evaluation implies judgment,^and many faculty 
are threatened by a process that may well point out deficiencies in program- 
ming or areas in need of improvement for which they are responsible or in 
which they are involved. If the results are negative ir suggest changes with 
which faculty do not agree, they will often simply not accept the results, 
finding fault either with the evaluation or the people who conducted it. 

Finally, ahhough the motives for evaluation should always be scrutinized, 
they are particularly important in the case of curriculum and instructional 
evaluation. If the motiv ^s or reasons for the evaluation are not explained and 
accepted by faculty, they may feel that there is some potentially harmful out- 
come to be avoided; view the evaluation as "busy work" and not take it 
seriously; or view the ^ valuation as "management ordered," and refuse to 
cooperate. Resistance to change may be a contributor to any of these prob- 
lems, whicii ii.ay in turn be used by faculty as acceptable excuses for their 
resistance. ^ 

Most, if not all, of these problems can be overcome and faculty can be- 
come an important part of the evaluation process, assisting in the pk . ning, 
implementation, and analysis of results. The key is to involve faculty i^om 
the beginning, discussing with them the reasons for the evalualion and the 
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potential payoffs from it and giving them time to become comfortable with 
the persons who will be directing the evaluation. The following case studies 
illustrate the problems and the successes of various forms of curriculum 
evaluation. 



An ^valuation of a Professional School Curriculum 

Overview: A new assistant dean for academic affairs was appointed at the 
school of dentistry, and his m^jor concern was the curriculum. Although the 
curriculum had changed over the ten-year period since the school was es- 
tablished, the rapid growth in courses, students, and faculty had precluded 
rigorous assessment of its effectiveness. The new dean had had prev^ious 
teaching and administrative experience at two of the most innovative dei^tal 
schools in the country, where evaluation was the basic ingredient of educa- 
tional improvement, and the school looked to him to direct the much needed 
curricular revision. \ 

The dean, in turn, contacted the authors to explore ways by which they 
might assist him. At their first meeting, four weeks before the fall quarter 
began, he clarified his intent. He wanted to know how effective the present 
curricular structure and its offerings were in ac'ci)mplishing the goals of the 
' school, meeting the needs of the students, and most important, preparing the 
students to be practicing dentists. Our task was to draw up a plan for the 
evaluation and present it to him in two weeks. If the plan was accepted, the 
project would begin as soon a§ the school year started. 

During the next two weeks, we examined the school's extant goals, re- 
viewed accreditation reports, and interviewed small, representative samples 
of both faculty and students. At the second meeting, an outline of the evalua- 
tion strategy and objectives for the project were presented to the dean as 
follows: 

1. To systematically develop measurable curricular goals for the school 
and departments based on graduate outcomes; 

2. To evaluate the attainment and relevance of these goals on the basis of 
actual graduate behavior and attitudes in their practices; 

3. Tq make appropriate changes in the curriculum based upon the in- 
formation gained from both the study of graduates and the goal 
formulation process itself; 

4. To assist faculty in planning, developing, and evaluating instructional 
strategies relevant-to the curricular and instructional goals; and 

5. To establish an on-gging evaluation program to facilitate a continuous 
process of curricular change and renewal. 
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th? initial: analysis revealed that the School goals, as stated., could 
contribute little, if anything, to an evaluation of the curricular program. Like 
most so-called cducatiQnal goals, they fell into two types of statements: **the 
^school will, provide . . and Vthe graduates will be good dentists" (or the 
equivalent), the former typepf goal is met simply by providing whatever is 
to be pro vided^ and evaluatioirmerely consists of a double check of that pro- 
vision. The latter type of goal is so global and vague that it is impossible to 
m'easure itsrattainment; evaluation is equally impossible. Neither type of 
:goal is reflective of or dependent upon curricular practices. Clearly, the first 
priority.for the school >vas to establish goals that were specific, measurable, 
and directly related to the curricular program. The objectives for the evalua- 
tion were accepted by the de^, and that is when process became a high 
priority as the foundation for the project. 

The best laid evaluation plans can come to naught if the support of the 
people involved is lacking. To be effective, cjurriculum evaluation in 
particular must^be conducted and perceived as a cooperative, collaborative 
venture, not as an activity imposed upon the m^ority by a select group of in- 
dividuals; In this case, the faculty had had many negative experiences with 
evaluation **experts" over the years, and there was little reason to assume 
that they ^ould coop'erate. Their cooperation had to be earned. 

The support of one very important person was obtained, but he did not in- 
volve himself in the evaluation in any way other than approving the funds 
necessary to conduct it. This person was the dean. He felt that the cur- 
riculum of the school was taught by the faculty /or the students, and that 
they — the faculty and students — should together analyze it and recommend 
changes within, a supportive, but neutral environment. Reintroduced us to 
the entire faculty at the first fall faculty meeting, reiterated his complete sup- 
port of the project, and did not ask for, or receive, any further communica- 
tion fiom us during the entire first year.- 

The Chronology of the First Year: In order to establish: the process for 
developing the-school and departmental goals, as well as mephanisms for 
evaluating their attainment, it was agreed that an existing faculty committee 
would work with the evaluators rather thart creating a new, additional struc- 
ture ifor the project. The standing curriculum committee appointed a sub- 
committee composed of an administrator, two faculty members, and one 
student. * * 

In order to provide a framework for the evaluation, the following assump- 
tions were defined at the first working meeting: 

1. The goal of curricular renewal is the improvement of teaching and 
learning. 

2. Any really meaningful changes in the curriculum and, ultimately, 
improvement in the teaching-learning process, must be fully integrated, 
with a rigorous, compr6hensive evaluation strategy. 
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3. The focus of evaluation must be oh outcomes— in terms of student 
achievement and satisfaction; faculty motivation, development, and 
satisfaction; responsiveness of course offerings and cumcular se- 
qiiencing; and, finally, outcomes in^termTof the tofal school environ- 
ment. / 

At that same meeting, the subcommittee reviewed and ratified the objec- 
tives of the^project and agreed upon the procedures that" would be used to ac- 
complish them. The first step was to solicit ideas for first-order school goals 
through interviews with the faculty and students, and on the basis of these 
conversations, each committee member would generate a list of tentative 
goals for consideration. / 

Although the components that comprise a measurable objective were re- 
viewed at the meeting, the committee members returned for the subsequent 
meeting two weeks later with a lengthy hodgepbdge of vague ideas and 
global, motherhood-type statements similar to the vacuous descnptions of 
most school catalogues. Several hours were sp^ht distilling their essence, 
collapsing them, and rewording them i;ito goa)s that were at least semi- 
measurable and based upon graduate outcomes.puring the following week, 
the goals were further refined, and presented to the committee for review. 
Changes in wording were explained, and approval was obtained for each 
change When all of the goals were in acceptable form and accurately 
conveyed the intend? of their "authors," it was agreed that they should be 
circulated among tlie faculty and students to gain their reactions and accep- 

^^The tentative goals were sent to every fu(l-time faculty member Md 25 
percent random samples of the student bodv, each drawn representatively 
from all four class levels. Everyone was askfed to review each goal and sug- 
gest criteria that they would accept as [evidence of its achievement. 
Response rates were 90 percent from the faculty and 80 percent from the 
students. Consensus on the goals ranged frok 75 to 95 percent for the faculty 
' and from 80 to 95 percent for the students, and many suggestions for 
measurement criteria were obtained. Some of the respondents also sug- 
gested rewording goals they agreed with in' essence, and some suggested ad- 
ditional goals for consideration. The tabulated results, along with the sug- 
gested word changes and lists of criterion Measures, were circulated again to 
get another reading on the goals and a first reading on the cntena. This time, 
part-time faculty and faculty who held joint appoiiitments in other schools 

were also included. I . . ■ <• u i 

Again, responses and consensus were' overwhelming, both for the goals 
and the measurement criteria. The criteria were further refined and sent out 
once again. Then, only those objectivesjand measurements which received 
over 75 percent agreement were adopted as first-order goals of the schoo!- 
the cut-off point previously agreed upon' with the committee and the faculty. 
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-Everyone agreed that acccpiance of the goa^s by at least three-quarters of 
file faculty would minimize the possibility that a vocal minority would 
prevent their attainment, ' 

The same process of goal formulation was ^^hen instittjted for each of the 
school's sections (units equivalent to departments), except in this case, sec? 
tioii representatives; formed the workinjg committees, and each section 
defihed Jts own cuf-^fT point for goal adoption. A few were lower than 75 
"percent, ;but.;host were higher. Hll of the faculty in each section were in- 
volved in\ the process, and measurable pbjectives and criteria wer^ es- 
tablished for each section that were congruent with and supported the goals 
established for the school at large. 

The first year of the project was thus conipleted. The school and each of 
its sections:had a set of objectives and critenon measures to assess their at- 
tainment,, and a process for curriculum development and evaluation had 
»l)een. established. At many institutions this would have been a one-month 
project; Why had it taken so long in this case? 

/rt ;/te/r<75/?ec/; Completing the initial stage of the project took the better 
part of a year, but we firmly, believe that the slow movement through this 
phase was essential for several reasons. Many of the faculty were far from 
receptive to the project from the beginning. They had experienced too many 
simplistic workshop overviews of objectives. They had been required to 
write "behSvioral" objectives for their courses, but few faculty saw the rela- 
tionship between the objectives and their teachi;ig. Once written, the objec- 
tives were filed away only to be brought out fo; periodic accreditation visits. 
Since few faculty actually used their couise objectives, the relationship 
• between school and section goals and the curriculum was very remote, 

A second reason for the slow progress was' that some of the faculty who 
favored the project from the beginning were supportive for the wrong 
reasons. Anticipating mim'mal cooperation from other faculty, they saw this 
project as a way to railroad pet goals into the curriculum and thus obtain 
more curricular hours for their section. In the eariy stages of the project, i^ 
was evident that for some people an important measure of profession 
worth lay in the number of canicular hour^ for which they were responsible, 
A recurrent theme in introductory conversations was: **Hi. My name is Dr. 
So and^ So (no one ever had first names); my section has sixty-three cur- 
riculum hours. The national average is forty-two, you know/* 

In order to counteract this attitude and yet gain tlu: faculty's cooperation, 
we spent the first two months of the project doing little more than visiting 
faculty in their offices, chatting with them in the halls, and having coffee and 
lunch with them in order to get to know them, explain the purposes of the 
, project, answer their questions, and slowly gain their support. The time was 
not ill spent, in \jitc of the fact that one of us gaineJ ten pounds and the 
• other becairte-allergic to coffee. Many faculty simply needed to get to know 
the evaluators (and in some cases, judge them) on a personal level first apd 

* O * • ^ 63 



as evaluators second. Others neederd to get their "air time" to present gnpes 
about the school and/or to" demonstrate their own expertise as educa- 
tors/evaluators. Oddly, the^fact that we knew nothing about dentistry was 
never raised. But through.these formal and informal visits, the purposes of 
the project were conveyed to the faculty, and they began to accept the fact 
that- tliere was no hidden agenda, that we could be trusted and that there 
would indqed be an ev^uation that they would help design and conduct. The 
fopnal process of goal setting and review could then begin. But soon after, 
the project raainto its second slowdown. 

As thefaculty became convinced that they really would be responsible for 
establishing 'ifie curricular direction of the school, they developed an almost 
insaUable thirst for information: How can I be sure that the objectives for my 
section will be good? How do we know our tests are fair? How do we 
evaluate clinical performance? How can I be sure that my instructional ma- 
terials and methods are adequate? In response to these. requests, and with 
the support of the still-invisible dean, we conducted a series of seminars and 
workshops ranging from methods-type classes on instructional techniques 
and student learning styles to workshops on test construction and clinical 

evaluation, ' . . 

As a result, by the end of the first year, in addition to the goals and cntena 
that were established for the school and sections, a learning environment 

■ iiad been created in which teaching received major attention. None of this 
would have happened if the faculty had not been developing their own goals 
with an ey6 toattaining them through teaching. 

The goals certainly could h^ve been stated better if they had been wntten 
by, or purchased Trom, professional educators. And they would have been 
written in much less time and probably for much less money. But they might 
have ended up on the proverbial shelf along with the other goals and objec- 
tives that had been lying there for years. The purity of measurable objectives 
had been violated. But, while classically imperfect, their intents were ade- 
quately conveyed and faculty were co-nmitted to working toward their at- 
tainment. Garnering faculty support was far more important for the future of 
the project than producing classically perfect objectives. There was ample 
opportunity to reworl the objectives later; there was only one opportunity 
to gain thd" faculty's trust. So the first year ended-far behind schedule, but 
way ahead in support. . . . u 

• The Second Year: The next phase of the project was inaugurated by hav- 
ing each section present their objectives to the rest of the faculty, 
aemonstratingsbow they contributed to the overall school goals and comple- 

' mente4 and expanded upon those of.the other sections. Many overiaps and 
gaps were identified, and ad hoc joint section commUtees were se^upjo 
investigate and explore solutions, 
Each section had been asked to send one representative to these intorma- 

"tion-sharing sessions. If all had complied, that would have meant an atten- 
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dance of 23, However, so many people' wer.e interested, an average of 
seventy came to each of the first three meetings. Also, a 16-hour course on 
criterion-referenced: measurement was offered and 27 faculty attended. A 
series* pf tnim-courscs on **What We Wrote As Objectives But Never 
Learned Ours^l^s"' was introduced, anjd 40 people came to the first session. 
Since there Nvcre only about 60 full-time faculty (and 140 part-time), this 
Represented an amazing show of support for in-service training. In addition, 
complaints wercf Voiced Jijy faculty who had classes or laboratory sessions 
wMch conflicted with the hours of the workshops and seminars. As a result, 
and at t|Tie request of the faculty, the dean designated one-half day each week 
as **jFacuity Development Day." Classrooms and laboratories were closed 
and the school turned into an instructional laboratory. More teaching 
improvement classes were introduced, as well, as several discipline-oriented 
continuing education courses. Faculty attendance, which was always volun- 
tary ,„hovered around.90 percent. ' 

And where was the curriculum evaluation that had started ail of this 
activity? Actually, it was aU over the place. Afterhearing about a particular 
instructional principle called appropriate practice at one of the classes, one 
professor cancelled a lecture and took his students to an empty laboratory to 
practice. Another took his students out of a laboratory Nvhere they were 
practicing something he decided wasil*t all that important. Still another 
faculty member decided that his age-old pracjice of giving a quiz at 8:03, 
every morning, for no other purpose than monitoring attendance, could be 
dispsnsed with.* Faculty were reexamining their objectives, not necessarily 
with an eye to changing them, but to understanding their full implications for 
the classroom. 

Faculty were talking to each other about their teaching and how their 
students were progressing toward particu)* objectives; curricular hours i 
were seldom mentioned. Moreover, somehow, coincidentally, first names^ 
surfaced, and the unfriendly atmosphere seemed to disappear. Many of the^ 
faculty began to work with us on a number of special projects. Somej 
developed self-instructional materials that contained outrageously irreveren^ 
cartoons and humor (which the students loved — and learned from); others, 
helped us prepare an objectives-based questionnaire for the graduate^. 
Change was taking place, albeit somewhat less systeaiaticajly and more 
serendipitously than had been planned. Evalu-ition and change had becojiie 
a cyclical process. Evaluation served ?s thej incentive. for change; change, in 
turn, necessitated evaluation. \ ] 

The Survey: The foundation of the formal evaluation was the giw^duate 
questionnaire, and again, everyone was involved in its developnient. The 40- 
page compendium of objectives was analyzed, and itejT' . *verc developed for 
cdich objective specified for the school and the sections. A draft wa^ ap- 
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proved by the committee and distributed to the faculty and student body for 
their review. This was not a typical alumni questionnaire full of ' Wl^ereare 
.you, what are you doing, and wTiat did you think of your education? *JC w^ ; 
entirely dependent upon tiie,specified measurable objectives. Because of its 
length however, the faculty and students were asked to designate iten'.s that 
they considered "absolutely essential" so that two versions of the question- 
naire, with these as common items, could be devised, thus keeping the 
length manageable for any one respondent. Each questionnaire still ended 
up being 17 pages long. They were sent to every person who had graduated 
the previous year, and over 60 percent were Jnitially returned. A follow-up 
letter increased the,response rate to 85%. 

We had intended to codify the data, analyze it, and prepare a wntten 
report on theifindings, but the faculty were impatient. They were determined 
to have an immediate look at the information, so the data were simply tabu- 
lated and sent to them. At the same time, a formal analysis was prepared, 
but^the faculty didn't seem to need it. 

For the next few weeks, everyone was talking about the survey results. 
Special section meetings were called to discuss the implications. Requests 
for curricular changes were brought to appropriate committees and moved 
■ through the formal structure. Informal changes took place immediately in 
the classrooms. Timidly at first, faculty began to ask if their objectives were 
"revisable." Some were quite concerned that their most cherished objec- 
tives were being ignored by the students once they graduated. The purpose 
of the survey was to find out what graduates were actually doing in their 
dental practice. Now that the faculty had that information, they had several 
alternatives: they could eliminate or change their objectives; they could in- 
stitute measures designed to ensure that more graduates would accept the 
importance of their objectives and honor them in practice. They could also 
do nothing. It was up -;0 them; they had the information, and they now had 
the skills. 

Summary: This was a curriculum evaluation, but it contained many more 
elements than are typically found in such endeavors. Some would separate 
the project into faculty development, instructional improvement, organiza- 
.tional development, evaluation, or other categories. The project may have 
encompassed these components, but the focus was upon one goal— the 
improvement of teaching and learning. In this case, these elements were 
companion activities necessary to accomplish the goal. 

The curriculum evaluation was and continues to be a success. Student and 
faculty evaluations of the project were extremely ./avorable. They were 
pleased with thB processes, the Icarnir g outcomes, and the results. The 
project has been institutionalized and is now in its fourth year. As intended, 
we, as external evaluators, became disp;nsible; the faculty continued the 
process of change and evaluation. 
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The following principles summarize what we believe ar,e the m^*or reasons , 
for the p.roject*s success i 

1. There was strong administrative support coupled with very low 
admimstrative visibility. 

2. The evaluators were allowed flexibility in the initiation and evolution 
of the project (for example, the two months spent "setting the stage" 
and Jhe ability to add elements to the project at the request of the 
faculty). ' 

3. The evaluators were there as objective, external-change agents, 
removed from internal politics and with no ties to any particular 
constituency. 

4. The project was designed to respond to the immediate concerns of 
faculty, giving it a credibility and an influence necessary to confront 
the more complex and comprehensive changes to come later. 

5. A m^or standing committee was used to help plan and implement 
each stage of the project. For this reason, faculty did not feel that 
something foreign was beinp imposed on them. 

6. Faculty (and students) were involved in the conduct of the project 
from the beginning and had basic control over its direction and out- 
comes. 

7. Faculty were offered training in the skills required for full participa- 
tion in the project. Those who wished to learn more were trained as a 
cadre of *'in-house experts" to work with others on an individual 
basis and lead courses and workshops themselves. 

8. The eyalCiation staff and budget were kept to a minimum. The two 
external consultants were augmented by resources already there. The 
dean's secretary arranged all meetings and schedules; other tasks 
were assumed by faculty and students. 

9. Evaluation was fully integrated into all aspects of the teaching/learn- 
ing/management processes of the school; it was not a mere append- 
age. 

10. The mechanisms, processes, products, and outcomes of the project 
were fluid. 

An Evaluation of a Public School Curriculum 

Overview: A small, relatively isolated, politically conservative rural com- 
munity had been transformed, because of its accessibility to a m^or 
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expressway and a building boom that spanned a ten-year period, into a large 
' suburban community with a great mix of people who held opposing views re- 
garding educational philosophy in general and the curriculum of the schools 
in particular. When dissatisfaction with the schools grew to such a point that 
school bonds failed to pass, a group of parents and community residents, in 
cooperation with the school board, initiated an evaluation of the school cur- 
riculum and the- district's policies, and also contracted with external 
consultants to conduct a separate **objective'' evaluation (22). 
The objectives of the external evaluation were: 

. h to examine and document the competing values of various community 
groups; ^- 

2. to determine areas of agreement and disagreement among community 
residents and examine how these shaped the school program and af- 

. fected school policy; and 

3. to formulate plans to enhance school-community relations, reduce the 
^'''""^-^^^^EL^^^"^'^^' and improve educational opportunities for all 

sl!l3ents. 

• Procedure:. Although the term was not used, the basic principles of 
transactional evaluation were employed by the external evaluators, since the 
evaluation called for the confrontation and resolution of conflict. The first 
step was to identify the divergent goals, educational philosophies, and at- 
titudes present in the community. Three activities were initiated in order to 
gain this information. . * . * 

First, a mail survey was conducted by the citizen evaluation committee in 
which resic^ents were asked \o rate on a five-point scale the degree to which 
they agreed or disagreed with statements regarding educational philosophy, 
school goals, school programs and policies, physical facilities, school-citizen 
communication, and taxes, They were also asked to indicate what elements 
they were most and least satisfied with in the schools. Second, in order fo 
determine discrepancies and congruencies between parents, teachers, 
school administrators and non-parent residents regarding the purpose of 
education, 36 members of the lay citizens' committee, 35 teachers, and 2 
administrators were asked to rank 106 educational goals listed on a form 
commonly used throughout the country (40). Finally, formal classroom visi- 
tations were conducted in each class in the district, as well as some classes 
in a nearby district/or comparison, by both members of the lay citizens' 
committee and the external evaluation team. 

Responses to the mail survey were examined according to age groups, 
length of time residing in the community, and whether or not respondents 
had children in the schooi sysiem. Two distinct value systems clearly 
emerged that were classified according to Spindler's (81) definitions of tradi- 
tional and emergent values. Spindler defines traditional values as those 
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which emphasize thrift, selfrdenial, postponement of satisfaction, success, 
and a belief that the means to it is hard work, absolute morals, and elevation 
of the individual as an end rather than the group, pmergent values are 
defined as those which emphasize sociability, sensitivity to the feelings and ^ 
needs of others, a relativistic attitude toward moral norms, and a here-and- 
now orientation that refipxts uncertainty about the future (81). 

Traditional values were held by residents who had lived in the community 
for more than ten years before the changes in the community and the schools 
had taken place and, by those who did not have children. People who had 
lived in the community for less than ten years and those who had children 
enrolled in the local schools held more emergent values. The two groups of 
residents who were at odds with each other were clearly identified. Break- 
downs by age showed no significant differences. Although the beliefs of 
residents who had children in the schools might be considered more relevant 
and thus more important, all citizens are entitled to. vote, and both board 
memberships and school bon J issues are decided by and subject to the input 
of all. Further, pressures on administrators come from all quarters. 

The results of the goal ranking demonstrated that the parent/citizen group 
ranked academic skills and their relationship to everyday life the highest, 
while teachers stressed creative, affective, and artistic goals. Some areas of 
agreement did emerge between these two groups; the development of self- 
esteem and knowledge of sociology arid citizenship were important, and reli- 
gion was least important for both groups. 

During the site visits, parents and other citizens observed ed^cational 
practices and found that many of the negative issues raised regarding the 
schools were grossly exaggerated, and in some cases nonexistent. The ma- 
jority of teachers were in fact emphasizing rather traditional values. The 
consultants corroborated tnese perceptions. In addition, standardized tests 
were administered to students by the external evaluators, and it was found 
that the mean achievement scores were at least at grade level for all grades 
arid considerably above grade level for^the majority of grades. This very 
traditional measure of achievement satisfied everyone that the students were 
receiving a quality education, a point that illustrates the weakness inherent 
in a noncomparative evaluation. The primarily middle-class students were 
no doubt above average, and there is no way of knowing without a compara- 
tive evaluation whether these students would have scored higher had they 
been learning under another set of conditions. 

Although some weak points were identifie'i by the classroorii observa- 
tions, ideological judgments were largely replaced by data or perceptually 
based information. Specific practices were examined more as to their effec- 
tiveness than as to how much the> conformed with value structures. As 
Hash et al. (22) summarized, "it soon became evident to all but the most 
hardened ideologists that the eariier assumptions were too broad, often did 
not correspond to the facts, and were untenable as a basis for making policy. 
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Ideological rhetoric was reduced, the climate for teaching and learning was 
enhanced,, and a better relationship with a more informed community 
resulted/' 

Although the study did have some methodological limitations, seven 
m^or oiitconies were attributed to the project (22): 

h Greater community interest resulted in a larger turnout for schooh 
boardelections. ' , 

2. School board members were more carefully selected and candidates for 
the .board made more effort to inform the community of their stand on 
specific issues. 

3. Positive community interest in the schools increased as did a readiness 
to contribute to as well as critique school activities. 

4. Citu s were better informed and demanded increased communication 
with school administrators and teachers, as well as a more systematic 
organization of school curriculum. 

5. Demand increased for accountability of school administrators to both 
citizens and teachers regarding curriculum, student achievement, and 
finances. >. , 

6. Antagonism between teachers and 9itizens over. the purpose and orga- 
nization) of classroom instruction was reduced . 

7. A citizens' advisory committee was established to work directly with 
the school board and to serve as a source of input for citizen opinion. 

.The niost obvious reason for the positive results that came from the 
project was that the citizens of the community were deeply involved in both 
Jhe planning and implementation of the evaluation of their schools. Possibly 
because the external consultants were conducting a companion study, the 
citizens' group steadfastly attempted to make their study as valid as possible 
so that their recommendations to the school board would be received with 
equal weight. As a result, recommendations from both group<? were veny 
similar although the external evaluators called fo^r more extensive changes 
than did the citizens. 

Facts that were brought irtto the open replaced idcclogical rhetoric that 
had previously kept the two factions of residents from agreeing on school 
pdlicy. But it was not just the information that helped resolve the problems 
in this community; it was the manner in which the data were collected, 
analyzed, ^nd reported. 
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Needs Assessments 

The process by which one identifies needs and decides upon their priority 
has been termed needs assessment. A need may be defined as a condition in 
which thcrc^is.a discrepancy between the actual or observed state of affairs 
and a desired or acceptable state of affairs (3). In the educational world, this 
discrepancy can be determined by objective measurement (for example, 
fourth grade students are given a test to measure their skill in mathematics, 
and the results are compared with a set of standards expected for children in 
the fourth grade). The extent of the .discrepancy may also be estimated sub- 
jectively (for example, a group.of **judges'' observe the operations of an in- 
stitution or a particular program and collectively decide what the needs seem 
to be). In both CHoCS, decisions concerning the .desired standards and the 
degree of need involve value judgments. ^ 
' The following case studies briefly describe two needs assessments 
conducted in different settings and, accordingly, using different procedures. 

A Needs Assessment of a Professional School:^^ A general feeling of 
stagnation existed within the school, and both the faculty and administration 
were dissatisfied with the qualify of education being provided. The faculty 
VaJ also splintered, and there was no consensus regarding either the reasons 
for the lack of vitality or ways by which the situation could be improved. In 
an attempt to bring the conflicts more clearly into focus and begin to develop 
solutions, the authors were asked to conduct an organizational diagnosis and 
needs.assessment. 

The objectives of the project were threefold: 1) to identify critical organi- 
zational and curricular problems that directly and indirectly affected the 
functioning of the school and the quality of its educational program; 2) to 
recommend appropriate entry points for intervention strategies that would 
most effectively redress the problems identified; and 3) to design a pr(^ram 
for planned change and institutional renewal that could serve as a basis for 
on-going evaluation and continuous improvement of the quality and effec- 
tiveness of the organization and its instructional program. 

The needs assessment was conducted by a team of three external 
^onsuTtants who spent one ^veek at the institutional site. As part of the 
analysis, they worked with faculty, administrators, anu students, helping, 
them clarify the reasons for their dissatisfaction, identify points of conflict, 
and explore possible strategies that would lead to their resolution. 

A variety of procedures were used to gather the information necessary for 
the diagnosis and andysis: semi-structured interviews and informal dis- 
cussions with individuals and groups of faculty, students and administrators; 
direct classroom observations; and document analysis. The documents pro- 

^^The type of professional school in which this needs assessment took place has been withheld inaccordancc 
with the wishes for anonymity on the part of the school staff, 




vidcd background information regarding thejhistory of the school and critical 
events in its growth; the observations andjinformal conversations enabled 
the team to explore issues in greater depthjand to discover issuesahat had 
:not surfaced tn the interviews. j | 

Intensive, two-hour interviews were held with approximately 75 percent 
of *the full-time faculty in groups ranging from four to six participants. Each 
group was selected to be representative of d'.ffere^nt curricular areas, varying 
levels of faculty raiik, and tenure and length cf tipie at the school. All of the 
mjuor school committees were represented }r\ the interview groups. In addi- 
tion to the meetings with faculty, special interviej^v sessions were held with a 
group of students representing all class levels pnd with the dean and his 
administrative associates. An informal **qrop-i^" afternoon session was 
reserved for people who had not been involved jin the scheduled interview 
groups, could not attend at.their scheduled time, ^r wished to talk further. 

The series of questions asked by the team was similar for all groups. No 
attempt was made to interpret the answersjor influence the direction of ^he 
responses by deJimiJing. the scope of , the , questions or channeling 

- respondents^ answers, even when they were being heard for the fortieth 
time. One exception was that discussion about problems related to facilities 
was discouraged^ since a new, well-equipped building was under construc- 
tion and would soon be completed* | 

Throughout the entire data gathering process, the focus and intent was en 
exploration and discovery. The goal was to find, out what the school was like 
and how it was^perceivfd by hothstaff and. students. Thereality of the orga- 
nization as defined by its members was the primary concern, since their per- 
ceptions influenced the school's functioning and atmosphere. The 
' procedures established at the interview session^ reflected this perspective— 
that is, the interviewees were the experts as far as organizational operation 
and functioning and tfie f chooKs programs and pp^cesses jiyAre co^icefned. 
The evaluators were there to learn about the school, listeging carefully to 
what people said and observing how they interacted with each other. 

Although the make-up of the groups obviously differed greatly in mfny 
respects, there was almost complete agreement as far as descriptions of fne 
schooPs operation and functioning and identification of m^or problems and 
suggestions for their resolution were concerned. This was in marked 

- .contrast to the belief that there was conflict-^-the assumption on which the 
call for a needs analysis was based. 

The mjyor problems raised by all of the people interviewed, and cor- 
roborated by the team*s informal conversations, observations, and 
classroom visitations, centered>on the pervasive lack of comnjunication 
between and among the different constituencies, the fragrnentation of the 
curriculum, the faculty*s lack of training in teaching methodology, 
inadequacies in the testing and grading system, apd the lack of administra- 
tive follow-through. Th^^ last point is particularly important. 
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The interviewees commented rcpeatedly about-the lack of follow-through 

that 'chJU'actenzed the modus operandi of the school. A workshop was 

presented; a project was started; ideas were generated and accepted. But no 
support was ever provided for impljjmentation, and faculty who became ex- 
cited about a new project or idea for change soon became disillusioned. The 
faculty^s skepticism regarding the prospect of change carried over to the 
needs assessment project and, although they were cooperative— having de- 
cided to "give it one last try*'— many expressed doubts that any changes 
r'ould be forthcoming. 

In this cas^, however, they were wrong, and they were rewarded for their 
efforts. As recommended in the needs analysis report which was circulated 
to all faculty, the dean, approved the initiation of a long-term program pf 
planned organizational and curricular change and evaluation. The program is 
designed to address the problems identified and contains those components 
suggested by the faculty and students who had been interviewed. The school 
was diagnosed to be a closed system, one that would increase in entropy and 
disintegration. The goal of the change program is to help the school become 
more of an open system by establishing a process for a cycle of self-gener- 
ating change and evaluation. 

The scope of the program is broad, and its chances of success \vill be 
enhanced by the continuing involvement of many people. Although it is nct 
far removed from the dean's perception of what a change program might be, 
it is not his plan. It belongs to and will be implemented by all three of the 
school's constituencies— the faculty, the students, and the administration. 

A Needs Assessment of a Faculty Development Program: A large com- 
munity college district had established an instructional grant program that 
provided funds for the development of innovative approaches to teaching 
and instruction. Faculty in the district could write a proposal and, if selected 
in the competition, obtain funds to develop their design. The program had 
been well received, and many faculty undertook a variety of projects. Al- 
though the program had been operating for a number of years and had 
proven to be an excellent device for motivating faculty to examine their 
teaching, the director was concerned about the quality of the instructional 
products that were being developed. Funds for field testing were not avail- 
able, so rather than implementing a formative type of evaluation, the direc- 
tor asked external evaluators to conduct a needs assessment of the program 
.to determine faculty would benefit from a special course designed to teach, 
them the prin.:iples of instructional design, product development, and 
evaluation. 

The first step was to review all of the project proposal^ that had^been. 
funded as well as all interim and final reports in order to identify the nature 
and objectives of each project. The investigators then combined objective 
measurement and case study procedures in a holistic approach to the assess- 
ment analyzing the extent of the faculty's skills in instructional design and 
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product development. The assessment approach included^a jshort objective 
I, tcstgivento a random'sample of faculty in each of the colleges in the district 
jLahd^a series of on-site interviews with every facuUy nxember -whoihad 
j: received a grant since the program haci begun and with the local campus 
jj administrator who was supervising die program. Completed products were 
ijeviewed for content and evaluated for face validity according to the prin- 
i; ciples of instructional design. In the few cases in which student performance 
:data wcire avj^able, products were evaluated for their effectiveness 
I promoting student achievement and/or mo^vation. 

1 The faculty's levpl ofi^kill in instructional design and product develop- 
ment was assessed on the basis of several types of data and data sources, 
and dis^epancies did indeed exist between their knowledge of instructional 
design and the quality of their products. On the basis of these findings, the 
* investigators recommended 'that as a condition of receiving a grant, faculty 
i*shou(d participate in a special ^program designed to teach them the basic 
j principles of instructional, design, and that such a program should be 
! developed before the following year's grant program was initiated. It would 
1 have been unfair to discontinue the program on the basis of the needs assess- 
,ment. That was not its purpose. The program had increased faculty motiva- 
;tion to improve instruction and had rewarded tho^e who tried to do so, and 
. these were specified goals qf the grant program. The addition of an instruc- 
^ jtional design component in conjunction with the grant program served to 
I increase ihe chances that the resulting pro*ducts would be effective and of 
{High quality. ^ ^ 



; Summary 

^the case study descriptions of evaluations presented in this, chapter 
. demonstrate quite clearly tfiat there is no one cut-and-dried method for con- 
ducting them; nor is there one^**best** approach to evaluation for all situa- 
tions. They also provided examples of the variety of real wprid settings as 
i Well as the array of methodological choices available to the practitioner. 
! Particular models were not imposed as a ba?is for "generating evaluation 
; questions; nor were evaluation designs picked out of a hat. Each approach 
necessitated a design that would address the information needs and ques- 
tions required of the. evaloation and appropriate to the particular program 
. being evaluated. ^ 

Very often, the evaluationquestions that need to be answered require ex- 
i perimcQjai design. Is textbook A better than textbook B? Can students learn 
las well in condition A as in condition B? Does the program effectively ac- 
T^omplish our goals? Social action programs and organizational change pro- 
' grams, as illustrated here, are best suited for inte'lrated types of approaches 
such as transactional evaluation and holistic evaluation. Evaluators must 

o 79 



ERIC 



• .proceed in developing their designs much as a gourmet chef might go about' 
concocting a new, delectable dish— selecting a bit here and a handful there, 
a dash of this and a pinch of that— -combining the ingredients into a design 
that is suited to the particular program and its requirements and constraints. 

y/e have talked repeatedly of selecting designs and approaches that are 
suitabl^tp^^purposes of the evaluation and the information needs of deci- 
sion makers* But most real-^A^orld evaluations are constrained by the 
program setting, the budget provided for the study, and the time frame 
within which the evaluation must be conducted. Money is never mentioned 
in sfaduate seminars or in-service programs on evaluation. Yet, the truth of 
the matter is, despite the growing reverence for evaluation <7wa evaluation, 
few budgets allocated for it are sufficient to permit a thorough, .igorous, and 
cpmprehchsive investigation. More often than not, evaluation designs are 
the result of compromises necessitated by limited funds and/or limited time* 
The case studies were selected to provide examples of these very real prob- 
lems. " ... 

It should be apparent from a comparison of the case studies tKk holistic 
evaluation and transactional evaluation have many basic similarities. There 
arc also in^portant differences that may well become more pronounced as 
both approaches are refined through continued use in various settings. The 
common threads of holistic and transactional evaluations are: 1) persons 
representing key constituenci^.s at different levels of the program and dif- 
ferent levels of power to influence the program directly or indirectly are in- 
volved from the beginning; 2) multiple measures are used, including quanti- 
tative data and qualitative infomation obtained from observations and inter- 
views; 3) there is a concern for both process and outcome beyond attainment 
of pre-specified objectives; 4) the study of actual outcomes is combined with 
naturalistic observations of what was delivered and how people interacted; 
5) predetermined goals are not required nor are alternative causal possi- 
bilities .eliminated in the analysis without sufficient examination; 6) experi- 
mental design can be incorporated, but where this is impossible or im- 
practical to implement, other designs can be adapted; 7) evaluation can be 
viewed as either a continuing part of management or as a short term post hoc 
analysis; and 8) evaluators can serve as part of the program staff or as 
external eValuators outside of the program or organization. Both approaches 
\ .are eclectic and flexible, and are adaptable to the needs and requirements of 
i the particular program being evaluated and the particular information needs 
\ being addressed. They are pragmatic, common sense approaches to program 
evaluation that provide comprehensive information acceptable to many dif- 
\ferent constituencies and useful to many different decision makers at many 
levels of power. 

The strategy of involving different people from the beginning of the 
evaluation, including some people wh^ are antagonistic to the program c 
rifay become so, is an important part of both approaches because transact- 
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tional evaluation is concerned with the resolution of conflict. In the absence 
of controlled experiments, the p. rticipation of program opponents increases 
the likelihood that biases iniavor of the program will be balanced and out- 
comes credited to the program will be verified. Another benefit is that initial 
meetings with representatives of diffeijent groups serve to introduce the 
evaluator to a broad cross section of key decision makers. The evaluator, in 
(urn, can use these opportunities t9 explain the purpose and needs of the 
evaluation, answer questions, involve ipeople in the process, and try to 
garner their support and cooperation. | 

Generally speaking, neither holistic ^or transactional evaluation costs 
more than a traditional design, and they^may well cost less than large-scale 
experimental design. A key element in transactional evaluation, however, is 
that representatives of the different grOups be brought together so that 
conflicts can be brought to the surface, confronted, and resolved. Ob- 
.viously, this is feasible only in relatively small-scale studies or in large-scale 
studies for ^which budgets are sufficiently large to allow people to come 
together. In holistic evaluation, there is i^ot a great deal of emphasis upon 
formally bringing the different groups together; whether or .iot it is done de- 
«pends upon the particular situation. Confrontation and the resolution of 
conflict are not strategic parts of holistic evaluation's foundation. 

The collection of data has always been'^ valued as a respected academic 
pursuit. But dissemination, other than through traditional journals and 
scholarly association meetings, has not been a responsibility accepted by 
evaluators. In many evaluations, the emphasis is placed on the dissemina- 
tion of information to the upper levels of imanagement— the top decision 
makers only. Little feedback is provided for personnel directly involved in 
the program, let alone persons who are not involved directly but whose deci- 
sions nevertheless affect the program operating within their organizational 
jurisdiction. Transactional and holistic projects are responsive to th^ in- 
formation needs of a broad audience— froni local program staff to institu- 
tional administrators, system officials, and legislative policy makers. 

Finally, holistic evaluation and transactional evaluation provide two le- 
.gitimate alternatives that can be considered when experimental or quasi- 
experimental designs cannot be applied, and they should thus be included in 
every practicing evaluator*s repertoire of program evaluation meth- 
odologies. 

UTILIZATION, QUALITY, AND ETHICS 

On& further cluster of issues that must be addressed in depth concerns the 
use of the evaluation results. A well-designed and well-conducted evaluation 
improves the process of decision making, it eliminates, or at least greatly 
reduces, the influence of political' or self-serving factors, and it provides ob- 
jective, defensible evidence. Evaluation can lead to the planning of more ef- 




fectivc^programs, since data-based evidence of what Js working and what is 
not :s available to program planners. Evaluation increases the likelihood that 
decisions will be wise and that subsequent policy will be rational. Why, 
th?n, should the results of such a wonderful process be so universally 
ignored? The fact that evatuaf^ion has generally had so little impact is well 
documented (10, 25, 60, 66, 72, 98, 99). 

Throughout this monograph, we iiave stressed the importance of provid- 
ing inforjnation fpr decisions regardjng projgram improvcuicili anu decisions 
regarding a program's futyre;. Biit Xht reasons for undertaking real world 
evaluations are not always^sps-ratiprial; nor are the underlying motives al- 
ways so nice. The actual use bP^the results of an evaluation often merely 
jreflect the reasons the jvajuatipny'was called for in the first place. 

Some evaluations are little n[iore than public relations rituals carried out to 
satisfy taxpayers or other publics demanding accountability; others re 
initiated merely to satisfy federal or state grant requirements. These evalua- 
tions are conducted not because program stafT really want to find out how 
well their program ' working, but Hecause they have to evaluate if they want 
to continue receiv..ig the external funds necessary to continue the program. 
In many of these cases, |.rogram staff really don't give a hoot about the find- 
ings of the evaluation. The fact that it was conducted is enough in and of it- 
self. . ' 

In Popham's (60) view, many educational evaluations are carried out "in a 
thoroughly practical milieu in which an evaluation^ results will constitute 
additional playing cards that people will be dealing from patently* political 
decks." Sometimes those decks are loaded. Politics is not confined to 
program operations, it affects both the motives for evaluation and the utiliza- 
tion of its results. Even the most dispassionatel> gathpred, methodclogically 
perfect data can be used to justify a weak program or destroy ^a good one. 
The best of evaluations can be undertaken for the worsl reasons. Some are 
undertaken merely as a ploy to get rid of an incompetent or uncov)peiative 
administrator or staff person (60). Weiss (99) .ists several other less-than- 
legitimate reasons that evaluations are initiated, to delay decisions; to 
provide support for or justify a program to "higher -ups", to make a ^access- 
fill program more viable and increase the prestige of the institution, or as a 
means of self-glorification for the directors. Evaluations are initiated to ap- 
pease program critics and because they are-fashionable and lend form of 
professional validity to the program (60). 

Along the same lines, Suchman (91) describes th^ following misuses 
evaluation may serve: Eye-wash, White-wash, Submarine, Posture, and 
Postponement. Eye-wash refers to deliberately selecting for evaluation ^nly 
those aspects of a program that look good on the surface in an attwinpt to Jus- 
tify a weak or bad program. White- wash refers to attempts to cover up actual 
, program failures or errors. Submarine refers io attempts to destroy a 
program rv^^^iUiess of its effectiveness, and Posture uses evaluation only as , 
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!a gcstufi? of objectivity. Postponement is an attempt jto delay needed action 
bypilWi^aingtcrseekthcfa^^^ , , 

Ev^uators shoidd not be siirprised in such c?oes if their reporjs^ are laid 
neatly to rest, ^Ibeit on a prominent shelf. And they jhould^b^a^ 
their r^pprtrarc not buried altogether Avhen the results are negative or run 
coitater to ve ted interests* FeV ^administrators or program staff in the '■eal 
wodl4are readily willing to accept evaluation results that may place the sur- 
vii^al Of their-program' (a^id their jobs) in serious^^eopardy . Only when organi- 
2^%nal,pcrspnnclthemsclves are:dis§atisfied with a program wjll they be re- 

..«cptiVi to ibe implications pf a negative evaluatibn and take its results 
scriou^* EvaJuators would save themselves a great dwil ofanguish if ihey 
found oiit what Oie motives underlying the evaluation wer^'and^made sur« 

titat the purposes were truly legi»'Tiate before they began. _ 

Bvaluatots would also be wise ^o be Mtenliv<^ to some basic procedur]es 
that seciiii to incrc^e the likelihood that the results of an ev?Iuafion will be 
used; 1) Identffy potential users early in the evaluation ^nd address issues of 
conperri \o them^^) involve jccprcscntatives from different constituencies in 
the procicss^oc evaluation, 3) complete the e valu^^tion promptly , according to 
schedule; 4) preparc.several (orm$ of the report, including a noritechiiical 
summary for lay audiences, 5) provide individuals whose program is being 
cvatuatcd with a draft report so ibey wilj hqve an opportunity to critiqire pie 
report and prepare a rcyoindcr , 6) take responsibility for presenUng theflhd- 
Jngs 10 decision makers and Jnifipreling them inio action plans; and 7) be 
available for advice, or d:istslanv^ in impjementing recoirmenliations even 
after the e,Yaluatior4:has been <;ompIejed» ^ 
The assumption in this discusJun, of i^ouxse* is that theevaluatigytirc^ 

, IS detailci am! cte? iy indicates specify vva>.<t b> Which the program n«g^t be 
unproved, Bui many cyaluators refuse la make si:gg«s*^ons ojr provide direc- 
tion fOrimpTOvcment. viewing.ilieir lole as one Of data gatherer and ar^Iy^er 
only. Many cvaluaiors jpr uyide onl> global recommcndaUons that arc simply 
too swcepmg to bccpravnc^. Man> evaluators make recommendations that 
are vague and^opcn to varied irtterpretaUon. Yet, few cf them ar^ willing to 
suck ^ound long enough to ini^iprct theii data or hdp translate their rec^m- 
ipendatiojis into acuon plans once their evaluation has bten completed 
Evaluators who jabdicai;5resporisibiht> foi follow through Invite nonuUHra- 
tionofthehiesults^ - \ , ^ 

Finally, a mm^ limiiauon on ihe use cf evaluation Jata and a mayor issue, 
that Tm%i be addressed Ci/nrcms xht quality of the evaluation and the 
evaluator. There is a uemcndcris gap te..vveCT the rhetoric; of evaluation and 

^ its demons^tcd pcrfoimante. Acwordii^ to WCi$s (98), r^Much evaluation 
is poor; more is rpcdioa/5/* but, despite a few kuggestions to evaluate 
d;^^!i^tions, ihcr& has not yet been devtloped a formal structure o^ 
ffudclines by which iba; can be ^vvoii)iplished. Cuba (36) offers the following 
cn'teiia for ^"^good** evaluations - 
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K Internal validity: The evaluation information corresponds to the 
phenomena which U purports to describe: 

2. External validity: Most evaluations are unc6ncerned about 
generalizability, but widespread application is important, particularly 
in the case ot social action programs, and here questions of sample 
representativeness and the similarity of testing conditions become im- 
portant, 

* ^ 3. 'ReliaJbility: Information provided by the instruments is consistent. 

4. Objectivity: Equally competent, independent judges or observers 
/ ^ vvould agree with' the results. ; 

si Relevance:. The evaluation information relates to the original purpose 
^ of the evaluation. 

^. Importance: The information presented in the report is important. 

/I Scope: Thg whole stor> is told with a wide range of informatioif in- 
/ eluding negative perceptions or facts. 

^ \ 

^ ^ 8. Credibility: The client ai?.d other audiences of th?v evaluation trust the 
evaluatpr and have confidence in the sources of information. 

9. Timeliness:^ The information ]t prepared in time to meet the client's 
needs. ' 

, 10. Pervasiveness. All audiences who/are entitled to it receive the evalua- 
tion information. , 

II. Efficiency: The cost of the evaluation in terms of time, personnel^ and 
funds is appropriate to tljc utility of the evaluation information. 

It is not easy to define criteria by which to evaluate evaluators. Further- 
more, as the.pressures.for evaluation increase and more and more evalua- 
tions arc required, tlje lure of the dollar will mount and evaluators will be 
faced with many ethical choices and many tljreatst to their integrity.'^ ^ 
Evaluctu.^n has become a profitable enterprise, arfd suddenly people frjni a!i , 
walks of life are calling themselves evaluators in spite of the fact that.they 
may lack training itx evaluation and do not possess the technical competence 
to carry out.quality evaluations. Having read a book or two or attended a 
course, qn evaluation does not an evaluator make. Evaluation is difficult 
!even under the best of circumstances, and seldom dp the best of cir 
stances occur. 

JBvaluadon is also a high stakes game, and it is not yet a wellAoncd 
professional practice with a code of ethics c Hippocratic oath. At worst, 
evaluators can become whores prostituting themselves for sufficiwt incen- 
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tivcs,(34); at best, they can unconsciously shade information that might 
(harma.program to which they feel morally committed. If an attorney loses a 
case, or a.doctor a patient, the result is indeed grim for the losers and their 
families.iBut evaluations play with larger numbers and the impact of a nega- 
tive evaluation is far-reaching. Programs are abolished and program par- 
tidpant$/<?rc deprived of services that they may have felt were valuable in 
spite of the fact that the program was pronounced ineffective on the basis of 
other criteria. Program staffs lose jobs and their families suffer. Evalu; *ors 
haye.the power to affect many lives, arid their competence and integrity 
assumes monumental importance. Evaluators must be skilled and th^y must 
""be competent. Above all, they must be ethical. 

Supi)Ose the HOPS program really had been in jeopardy. Should such a 
weltintentiohed f/rogram have been protected from a frugal governor §ven if 
it-meant distorting some of the data? How could the anonymity piomised to 
Mndividual colle/ies -have been maintained if •*good" programs had to be 
.:identified'ia 6r(ier to save them? There are an infinite number of questions 
*such as^these confronting, the evaluator, and the answers are anything but 
simple; 

In an effort to develop guidelines for educational evaluators that may be 
used as a code of ethics, many of the m^pr professional associations have 
appointed special committees to consider the issue. A preliminary set of 
standards developed by ihe ethics committee of Division H, American 
Educational Resean. h Association offers 1 1 statements of ethics for evalua- 
tors as follows: 

I, Evaluators should be independent to the extent that they follow 
professional ^nd personal standards. Evaluators shuuld be free of 
political interi'ercnce or coer^ioi., limited onl> b> general policies of the 
institution. Evaluators should'bc responsive to the needs of a client. 

^^2, Mutual suppoa-and team work arc ideal. A wlient pi ;fcssional relation- 
, ship should>exlsl N^hcre each can ha\t due respect for the other, but 
separate lesponsibilitics. Evaluators should be accountable to the 
- clients, bUi not subordinate to them. 

3, Political and social contexts exist and should be duly considered when 
reporting findings. The true outcomes of the studies should be reported* 
fcgardlcss of other factors. 

/ 4. Evaluator valjes may be expressed in the report, but should be 
identified clearly as pcr^.onal judgment*. Values and personal biases of 
the evaluator should be made known, to ilic client. 

5, The evaluator has the primary responsibility for design and mcth 
odology-and should make !hc final.dccisions on them, llie design and 
mcthqdologyvshould be agreed upon by the uscrbcfore implementation. 
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6. ;Revicw.of the Resign, instrumental ion, and other aspects of the evalua- 
tion by the client and fellow professionals should be sought. 

7. It is an csiscntial responsibility of the e valuator to be honest in reporting 
limifat/ons and/or constnints, of the evaluation. 

8. Negatiy,e findings should be treated the same as positive findings when 
reporting to the client. • * _ - 

9.. Release of results should be dependent up.on the terms of the contract 
between the evaluator and clientr 

-40^ The nl|mes of individual subjects should be kept confidential at ill 
\imesy in accordance with federal law. . 

1 h The evaluator should not accept an evaluation contract when evaluator 
. ethics and bias are at stake. ' - * 

« (Bivision H Newsletter, v. Ill, July, 1977) 

A joint .committee composed Qf representatives from AERA, the 
American Psychologica! Association^ and other national organizations have 
developed preliminarv guiclelines that are currently being reviewed by 
prominent educational evaiuators. Thes^ guidelines, which have not vet 
been released to the public, cover everything; from the scp^e of the informa- 
tion and timeliness xrf the >report to the fiscal responsibility, diplomacy, and; 
f of wal oWSgtJtion of the evaluator. 

Irfc-adSition to association committees on ethics, several leaders in evalua% 
Uow, such as Michael Scfi\4n, Robert Stake, and Blaine Worthen, have also 
begun to address thje issue of evaluator ethics in their writings, and no doubt 
evaluate, s will eventually be aWe lo turn to thes^e documents for guidance, 
in the tiveantime, it is important to recc^^n^ze that evaluation is an area that is 
fntught wiUi debatable questions of ethics and mural implLations. Until sn^h 
time that definiu^e guidelines are available, evaiuators must be scrupulously 
circumspect ^nd conscrentious. They shouid approach evaluation as a 
constructive process, viewing the goal of evaluation i& improvement, and 
when in doubt, ihey should remember the immortal Jvurdsof Ih^ patron sairit 
of evaiuators who said, "Let your conscience bfe your guide/' 
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